Combining Clinical Data and Domain Knowledge for Analyzing Mental Disorder Concept Relatedness and Usage

Concept relatedness played a significant role in biomedical domain as it facilitates a number of tasks including information extraction, natural language processing, concept clustering and classification. In this research, we leveraged concept embedding to measure concept relatedness and compared concept relatedness based on embedding to the Gold standard concept classification. We also used real-world data from mental health domain to measure concept usage. The results revealed that compatibility accuracy measure F score was 0.21 and about 20% of SNOMED Mental Disorder concepts are utilized for patient cohort selection task. This study contributed a method for exploring concept relatedness and usage, and different approaches to medical concept classification, providing insights for medical ontology developers and domain experts.

[1]  Terrence Adam,et al.  Semantic Similarity and Relatedness between Clinical Terms: An Experimental Study. , 2010, AMIA ... Annual Symposium proceedings. AMIA Symposium.

[2]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[3]  Tianxi Cai,et al.  Clinical Concept Embeddings Learned from Massive Sources of Medical Data , 2018, ArXiv.

[4]  James Bailey,et al.  Information theoretic measures for clusterings comparison: is a correction for chance necessary? , 2009, ICML '09.

[5]  Michael N. Jones,et al.  Querying Word Embeddings for Similarity and Relatedness , 2018, NAACL-HLT.

[6]  Fei Wang,et al.  Semantic relatedness and similarity of biomedical terms: examining the effects of recency, size, and section of biomedical publications on the performance of word2vec , 2017, BMC Medical Informatics and Decision Making.

[7]  David J. Ketchen,et al.  THE APPLICATION OF CLUSTER ANALYSIS IN STRATEGIC MANAGEMENT RESEARCH: AN ANALYSIS AND CRITIQUE , 1996 .

[8]  Olivier Bodenreider,et al.  Interoperability between phenotypes in research and healthcare terminologies—Investigating partial mappings between HPO and SNOMED CT , 2016, J. Biomed. Semant..

[9]  Alexander Budanitsky,et al.  Lexical Semantic Relatedness and Its Application in Natural Language Processing , 1999 .

[10]  Yong Wang,et al.  Document Clustering with Semantic Analysis , 2006, Proceedings of the 39th Annual Hawaii International Conference on System Sciences (HICSS'06).

[11]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[12]  George Hripcsak,et al.  Columbia Open Health Data, clinical concept prevalence and co-occurrence from electronic health records , 2018, Scientific Data.

[13]  George Hripcsak,et al.  Effect of vocabulary mapping for conditions on phenotype cohorts , 2018, J. Am. Medical Informatics Assoc..

[14]  Ted Pedersen,et al.  Measures of semantic similarity and relatedness in the biomedical domain , 2007, J. Biomed. Informatics.

[15]  Eduardo Mena,et al.  Web-Based Measure of Semantic Relatedness , 2008, WISE.