Disease Prediction via Graph Neural Networks

With the increasingly available electronic medical records (EMRs), disease prediction has recently gained immense research attention, where an accurate classifier needs to be trained to map the input prediction signals (e.g., symptoms, patient demographics, etc.) to the estimated diseases for each patient. However, existing machine learning-based solutions heavily rely on abundant manually labeled EMR training data to ensure accurate prediction results, impeding their performance in the existence of rare diseases that are subject to severe data scarcity. For each rare disease, the limited EMR data can hardly offer sufficient information for a model to correctly distinguish its identity from other diseases with similar clinical symptoms. Furthermore, most existing disease prediction approaches are based on the sequential EMRs collected for every patient and are unable to handle new patients without historical EMRs, reducing their real-life practicality. In this paper, we introduce an innovative model based on Graph Neural Networks (GNNs) for disease prediction, which utilizes external knowledge bases to augment the insufficient EMR data, and learns highly representative node embeddings for patients, diseases and symptoms from the medical concept graph and patient record graph respectively constructed from the medical knowledge base and EMRs. By aggregating information from directly connected neighbor nodes, the proposed neural graph encoder can effectively generate embeddings that capture knowledge from both data sources, and is able to inductively infer the embeddings for a new patient based on the symptoms reported in her/his EMRs to allow for accurate prediction on both general diseases and rare diseases. Extensive experiments on a real-world EMR dataset have demonstrated the state-of-the-art performance of our proposed model.

[1]  Weiqing Wang,et al.  Social Boosted Recommendation With Folded Bipartite Network Embedding , 2020, IEEE Transactions on Knowledge and Data Engineering.

[2]  Shuo Yang,et al.  Identifying Rare Diseases from Behavioural Data: A Machine Learning Approach , 2016, 2016 IEEE First International Conference on Connected Health: Applications, Systems and Engineering Technologies (CHASE).

[3]  Jiayu Zhou,et al.  MetaPred: Meta-Learning for Clinical Risk Prediction with Limited Patient Electronic Health Records , 2019, KDD.

[4]  Giorgio Valentini,et al.  Imbalance-Aware Machine Learning for Predicting Rare and Common Disease-Associated Non-Coding Variants , 2017, Scientific Reports.

[5]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[6]  Ms. Ishtake " Intelligent Heart Disease Prediction System Using Data Mining Techniques " , .

[7]  J. Ross Quinlan,et al.  Simplifying decision trees , 1987, Int. J. Hum. Comput. Stud..

[8]  Tudor Groza,et al.  The Human Phenotype Ontology in 2017 , 2016, Nucleic Acids Res..

[9]  Wen-Chih Peng,et al.  Exploiting Centrality Information with Graph Convolutions for Network Representation Learning , 2019, 2019 IEEE 35th International Conference on Data Engineering (ICDE).

[10]  Daniel R. Figueiredo,et al.  struc2vec: Learning Node Representations from Structural Identity , 2017, KDD.

[11]  Fenglong Ma,et al.  KAME: Knowledge-based Attention Model for Diagnosis Prediction in Healthcare , 2018, CIKM.

[12]  Kai Zheng,et al.  Origin-Destination Matrix Prediction via Graph Convolution: a New Perspective of Passenger Demand Modeling , 2019, KDD.

[13]  Hao Wang,et al.  PME: Projected Metric Embedding on Heterogeneous Networks for Link Prediction , 2018, KDD.

[14]  Wenwu Zhu,et al.  Structural Deep Network Embedding , 2016, KDD.

[15]  Katarzyna Musial,et al.  Multi-level Graph Convolutional Networks for Cross-platform Anchor Link Prediction , 2020, KDD.

[16]  Steven Skiena,et al.  DeepWalk: online learning of social representations , 2014, KDD.

[17]  Jimeng Sun,et al.  RETAIN: An Interpretable Predictive Model for Healthcare using Reverse Time Attention Mechanism , 2016, NIPS.

[18]  Harry Zhang,et al.  The Optimality of Naive Bayes , 2004, FLAIRS.

[19]  Robin C. Meili,et al.  Can electronic medical record systems transform health care? Potential health benefits, savings, and costs. , 2005, Health affairs.

[20]  Siddhartha R. Jonnalagadda,et al.  A Bootstrap Machine Learning Approach to Identify Rare Disease Patients from Electronic Health Records , 2016, ArXiv.

[21]  Jure Leskovec,et al.  How Powerful are Graph Neural Networks? , 2018, ICLR.

[22]  Fenglong Ma,et al.  Risk Prediction on Electronic Health Records with Prior Medical Knowledge , 2018, KDD.

[23]  Rui Yan,et al.  AIR: Attentional Intention-Aware Recommender Systems , 2019, 2019 IEEE 35th International Conference on Data Engineering (ICDE).

[24]  A. Olry,et al.  Estimating cumulative point prevalence of rare diseases: analysis of the Orphanet database , 2019, European Journal of Human Genetics.

[25]  Fenglong Ma,et al.  Dipole: Diagnosis Prediction in Healthcare via Attention-based Bidirectional Recurrent Neural Networks , 2017, KDD.

[26]  S. Brunak,et al.  Mining electronic health records: towards better research applications and clinical care , 2012, Nature Reviews Genetics.

[27]  Fenglong Ma,et al.  Personalized disease prediction using a CNN-based similarity learning method , 2017, 2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[28]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[29]  Peter A Merkel,et al.  Clinical research for rare disease: opportunities, challenges, and solutions. , 2009, Molecular genetics and metabolism.

[30]  Le Song,et al.  GRAM: Graph-based Attention Model for Healthcare Representation Learning , 2016, KDD.

[31]  Christine M. Cutillo,et al.  Progress in Rare Diseases Research 2010–2016: An IRDiRC Perspective , 2017, Clinical and translational science.

[32]  Jure Leskovec,et al.  Inductive Representation Learning on Large Graphs , 2017, NIPS.

[33]  Pietro Liò,et al.  Graph Attention Networks , 2017, ICLR.

[34]  Mingzhe Wang,et al.  LINE: Large-scale Information Network Embedding , 2015, WWW.