Multi-information Source HIN for Medical Concept Embedding

Learning low-dimensional representations for medical concepts is of great importance in improving public healthcare applications such as computer-aided diagnosis systems. Existing methods rely on Electronic Health Records (EHR) as their only information source and do not make use of abundant available external medical knowledge, and therefore they ignore the correlations between medical concepts. To address this issue, we propose a novel multi-information source Heterogeneous Information Network (HIN) to model EHR while incorporating external medical knowledge including ICD-9-CM and MeSH for an enriched network schema. Our model is well aware of the structure of EHR as well as the correlations between medical concepts it refers to, and learns semantically reflective medical concept embeddings. In experiments, our model outperforms unsupervised baselines in a variety of medical data mining tasks.

[1]  Nitesh V. Chawla,et al.  metapath2vec: Scalable Representation Learning for Heterogeneous Networks , 2017, KDD.

[2]  Bin He,et al.  EMR-based medical knowledge representation and inference via Markov random fields and distributed representation learning , 2017, Artif. Intell. Medicine.

[3]  Philip S. Yu,et al.  Fine-grained Event Categorization with Heterogeneous Graph Convolutional Networks , 2019, IJCAI.

[4]  Yan Liu,et al.  Benchmarking deep learning models on large healthcare datasets , 2018, J. Biomed. Informatics.

[5]  Mingzhe Wang,et al.  LINE: Large-scale Information Network Embedding , 2015, WWW.

[6]  Peter Szolovits,et al.  MIMIC-III, a freely accessible critical care database , 2016, Scientific Data.

[7]  Xiao Huang,et al.  Exploring Expert Cognition for Attributed Network Embedding , 2018, WSDM.

[8]  Soni Jyoti,et al.  Predictive Data Mining for Medical Diagnosis: An Overview of Heart Disease Prediction , 2011 .

[9]  Philip S. Yu,et al.  Mining knowledge from databases: an information network analysis approach , 2010, SIGMOD Conference.

[10]  Guido Zuccon,et al.  Medical Semantic Similarity with a Neural Language Model , 2014, CIKM.

[11]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[12]  Majid Sarrafzadeh,et al.  HeteroMed: Heterogeneous Information Network for Medical Diagnosis , 2018, CIKM.

[13]  Christina Eldredge,et al.  Population Analysis of Adverse Events in Different Age Groups Using Big Clinical Trials Data , 2016, JMIR medical informatics.

[14]  I. Kohane,et al.  Big Data and Machine Learning in Health Care. , 2018, JAMA.

[15]  Mark Sanderson,et al.  Christopher D. Manning, Prabhakar Raghavan, Hinrich Schütze, Introduction to Information Retrieval, Cambridge University Press 2008. ISBN-13 978-0-521-86571-5, xxi + 482 pages , 2010, Natural Language Engineering.

[16]  Ying Ding,et al.  Predicting biomedical relationships using the knowledge and graph embedding cascade model , 2019, PloS one.

[17]  David Sontag,et al.  Learning Low-Dimensional Representations of Medical Concepts , 2016, CRI.

[18]  Steven Skiena,et al.  DeepWalk: online learning of social representations , 2014, KDD.

[19]  Yu He,et al.  HeteSpaceyWalk: A Heterogeneous Spacey Random Walk for Heterogeneous Information Network Embedding , 2019, CIKM.

[20]  Jimeng Sun,et al.  Multi-layer Representation Learning for Medical Concepts , 2016, KDD.

[21]  G. Hartvigsen,et al.  Secondary Use of EHR: Data Quality Issues and Informatics Opportunities , 2010, Summit on translational bioinformatics.

[22]  Yan Liu,et al.  Benchmark of Deep Learning Models on Large Healthcare MIMIC Datasets , 2017, ArXiv.

[23]  Jure Leskovec,et al.  node2vec: Scalable Feature Learning for Networks , 2016, KDD.

[24]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[25]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[26]  Philip S. Yu,et al.  A Survey of Heterogeneous Information Network Analysis , 2015, IEEE Transactions on Knowledge and Data Engineering.

[27]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.