Unsupervised Annotation of Phenotypic Abnormalities via Semantic Latent Representations on Electronic Health Records

The extraction of phenotype information which is naturally contained in electronic health records (EHRs) has been found to be useful in various clinical informatics applications such as disease diagnosis. However, due to imprecise descriptions, lack of gold standards and the demand for efficiency, annotating phenotypic abnormalities on millions of EHR narratives is still challenging. In this work, we propose a novel unsupervised deep learning framework to annotate the phenotypic abnormalities from EHRs via semantic latent representations. The proposed framework takes the advantage of Human Phenotype Ontology (HPO), which is a knowledge base of phenotypic abnormalities, to standardize the annotation results. Experiments have been conducted on 52,722 EHRs from MIMIC-III dataset. Quantitative and qualitative analysis have shown the proposed framework achieves state-of-the-art annotation performance and computational efficiency compared with other methods.

[1]  Yike Guo,et al.  I2T2I: Learning text to image synthesis with textual data augmentation , 2017, 2017 IEEE International Conference on Image Processing (ICIP).

[2]  A. Barabasi,et al.  The human disease network , 2007, Proceedings of the National Academy of Sciences.

[3]  N. Shah,et al.  NCBO Annotator: Semantic Annotation of Biomedical Data , 2009 .

[4]  Peter Szolovits,et al.  MIMIC-III, a freely accessible critical care database , 2016, Scientific Data.

[5]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[6]  George Hripcsak,et al.  Deep Phenotyping on Electronic Health Records Facilitates Genetic Diagnosis by Clinical Exomes. , 2018, American journal of human genetics.

[7]  Gill Bejerano,et al.  ClinPhen extracts and prioritizes patient phenotypes directly from medical records to expedite genetic disease diagnosis , 2018, Genetics in Medicine.

[8]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[9]  Yike Guo,et al.  Integrating Semantic Knowledge to Tackle Zero-shot Text Classification , 2019, NAACL.

[10]  Pengtao Xie,et al.  A Neural Architecture for Automated ICD Coding , 2017, ACL.

[11]  Tudor Groza,et al.  The Human Phenotype Ontology in 2017 , 2016, Nucleic Acids Res..

[12]  Peter N. Robinson,et al.  The Human Phenotype Ontology: Semantic Unification of Common and Rare Disease , 2015, American journal of human genetics.

[13]  Jimeng Sun,et al.  Opportunities and challenges in developing deep learning models using electronic health records data: a systematic review , 2018, J. Am. Medical Informatics Assoc..

[14]  Paul N. Schofield,et al.  The role of ontologies in biological and biomedical research: a functional perspective , 2015, Briefings Bioinform..

[15]  Alan R. Aronson,et al.  An overview of MetaMap: historical perspective and recent advances , 2010, J. Am. Medical Informatics Assoc..

[16]  Guang Yang,et al.  The Deep Poincaré Map: A Novel Approach for Left Ventricle Segmentation , 2017, MICCAI.

[17]  Regina Barzilay,et al.  Style Transfer from Non-Parallel Text by Cross-Alignment , 2017, NIPS.

[18]  Parisa Rashidi,et al.  Deep EHR: A Survey of Recent Advances in Deep Learning Techniques for Electronic Health Record (EHR) Analysis , 2017, IEEE Journal of Biomedical and Health Informatics.

[19]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[20]  J. Henry,et al.  Adoption of Electronic Health Record Systems among U . S . Non-Federal Acute Care Hospitals : 2008-2015 , 2013 .

[21]  Franck Dernoncourt,et al.  Comparing deep learning and concept extraction based methods for patient phenotyping from clinical narratives , 2018, PloS one.

[22]  Peter N. Robinson,et al.  Deep phenotyping for precision medicine , 2012, Human mutation.

[23]  A. Barabasi,et al.  The impact of cellular networks on disease comorbidity , 2009, Molecular systems biology.

[24]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[25]  Diego Martínez Hernández,et al.  Automated semantic annotation of rare disease cases: a case study , 2014, Database J. Biol. Databases Curation.