Measuring the similarity and relatedness of concepts in the medical domain: IHI 2012 tutorial overview

The ability to quantify the degree to which concepts are similar or related to each other is a key component in many Natural Language Processing (NLP) and Artificial Intelligence (AI) applications. For example, in a document search application, it can be very useful to identify text snippets that contain terms that are similar to (but not identical) to those provided by a user. This tutorial will introduce the theory behind measures of semantic similarity and relatedness, and show how these can be applied in the medical domain by using freely--available open--source software (http://umls-similarity.sourceforge.net) (UMLS::Similarity). This software takes advantage of the Unified Medical Language System ( http://www.nlm.nih.gov/research/umls/)(UMLS), which is maintained by the National Library of Medicine (USA). The tutorial will also show how to evaluate existing measures with manually--created reference standards.

[1]  G. Miller,et al.  Contextual correlates of semantic similarity , 1991 .

[2]  Siddharth Patwardhan,et al.  Incorporating Dictionary and Corpus Information into a Context Vector Measure of Semantic Relatednes , 2003 .

[3]  Martha Palmer,et al.  Verb Semantics and Lexical Selection , 1994, ACL.

[4]  Ying Liu,et al.  Using Second-order Vectors in a Knowledge-based Method for Acronym Disambiguation , 2011, CoNLL.

[5]  Michael E. Lesk,et al.  Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone , 1986, SIGDOC '86.

[6]  Martin Chodorow,et al.  Combining local context and wordnet similarity for word sense identification , 1998 .

[7]  Ted Pedersen,et al.  Abbreviation and Acronym Disambiguation in Clinical Discourse , 2005, AMIA.

[8]  Terrence Adam,et al.  Semantic Similarity and Relatedness between Clinical Terms: An Experimental Study. , 2010, AMIA ... Annual Symposium proceedings. AMIA Symposium.

[9]  Ted Pedersen,et al.  Measures of semantic similarity and relatedness in the biomedical domain , 2007, J. Biomed. Informatics.

[10]  Ted Pedersen,et al.  Determining the Syntactic Structure of Medical Terms in Clinical Notes , 2007, BioNLP@ACL.

[11]  Ted Pedersen,et al.  Towards a framework for developing semantic relatedness reference standards , 2011, J. Biomed. Informatics.

[12]  Bridget T. McInnes,et al.  Knowledge-based method for determining the meaning of ambiguous biomedical terms using information content measures of similarity. , 2011, AMIA ... Annual Symposium proceedings. AMIA Symposium.

[13]  Ted Pedersen,et al.  Semantic relatedness study using second order co-occurrence vectors computed from biomedical corpora, UMLS and WordNet , 2012, IHI '12.

[14]  Bridget T. McInnes,et al.  Automated Identification of Synonyms in Biomedical Acronym Sense Inventories , 2010, Louhi@NAACL-HLT.

[15]  John B. Goodenough,et al.  Contextual correlates of synonymy , 1965, CACM.

[16]  Ted Pedersen,et al.  Using WordNet-based Context Vectors to Estimate the Semantic Relatedness of Concepts , 2006 .

[17]  J. Fleiss,et al.  Intraclass correlations: uses in assessing rater reliability. , 1979, Psychological bulletin.

[18]  Ted Pedersen,et al.  WordNet::Similarity - Measuring the Relatedness of Concepts , 2004, NAACL.

[19]  Ted Pedersen,et al.  UMLS-Interface and UMLS-Similarity : Open Source Software for Measuring Paths and Semantic Similarity , 2009, AMIA.