UMLS-Interface and UMLS-Similarity : Open Source Software for Measuring Paths and Semantic Similarity

A number of computational measures for determining semantic similarity between pairs of biomedical concepts have been developed using various standards and programming platforms. In this paper, we introduce two new open-source frameworks based on the Unified Medical Language System (UMLS). These frameworks consist of the UMLS-Similarity and UMLS-Interface packages. UMLS-Interface provides path information about UMLS concepts. UMLS-Similarity calculates the semantic similarity between UMLS concepts using several previously developed measures and can be extended to include new measures. We validate the functionality of these frameworks by reproducing the results from previous work. Our frameworks constitute a significant contribution to the field of biomedical Natural Language Processing by providing a common development and testing platform for semantic similarity measures based on the UMLS.

[1]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[2]  Olivier Bodenreider,et al.  Aligning Knowledge Sources in the UMLS: Methods, Quantitative Results, and Applications , 2004, MedInfo.

[3]  Dekang Lin,et al.  An Information-Theoretic Definition of Similarity , 1998, ICML.

[4]  Christiane Fellbaum,et al.  Combining Local Context and Wordnet Similarity for Word Sense Identification , 1998 .

[5]  David W. Conrath,et al.  Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy , 1997, ROCLING/IJCLCLP.

[6]  Mark A. Musen,et al.  UMLS-Query: A Perl Module for Querying the UMLS , 2008, AMIA.

[7]  Philip Resnik,et al.  Using Information Content to Evaluate Semantic Similarity in a Taxonomy , 1995, IJCAI.

[8]  Carole A. Goble,et al.  Semantic Similarity Measures as Tools for Exploring the Gene Ontology , 2002, Pacific Symposium on Biocomputing.

[9]  Hisham Al-Mubaid,et al.  New ontology-based semantic similarity measure for the biomedical domain , 2006, 2006 IEEE International Conference on Granular Computing.

[10]  Roy Rada,et al.  Development and application of a metric on semantic nets , 1989, IEEE Trans. Syst. Man Cybern..

[11]  Martha Palmer,et al.  Verb Semantics and Lexical Selection , 1994, ACL.

[12]  Mark A. Musen,et al.  Comparison of Ontology-based Semantic-Similarity Measures , 2008, AMIA.

[13]  James J. Cimino,et al.  Towards the development of a conceptual distance metric for the UMLS , 2004, J. Biomed. Informatics.

[14]  Keke Chen,et al.  Model Formulation: A Document Clustering and Ranking System for Exploring MEDLINE Citations , 2007, J. Am. Medical Informatics Assoc..

[15]  George Hripcsak,et al.  Inter-patient distance metrics using SNOMED CT defining relationships , 2006, J. Biomed. Informatics.

[16]  Ted Pedersen,et al.  Measures of semantic similarity and relatedness in the biomedical domain , 2007, J. Biomed. Informatics.

[17]  Hoa A. Nguyen,et al.  A Cluster-Based Approach for Semantic Similarity in the Biomedical Domain , 2006, 2006 International Conference of the IEEE Engineering in Medicine and Biology Society.