Semantic relatedness study using second order co-occurrence vectors computed from biomedical corpora, UMLS and WordNet

Automated measures of semantic relatedness are important for effectively processing medical data for a variety of tasks such as information retrieval and natural language processing. In this paper, we present a context vector approach that can compute the semantic relatedness between any pair of concepts in the Unified Medical Language System (UMLS). Our approach has been developed on a corpus of inpatient clinical reports. We use 430 pairs of clinical concepts manually rated for semantic relatedness as the reference standard. The experiments demonstrate that incorporating a combination of the UMLS and WordNet definitions can improve the semantic relatedness. The paper also shows that second order co-occurrence vector measure is a more effective approach than path-based methods for semantic relatedness.

[1]  Suzanne Stevenson,et al.  A Graph-Theoretic Framework for Semantic Distance , 2010, CL.

[2]  Dekang Lin,et al.  An Information-Theoretic Definition of Similarity , 1998, ICML.

[3]  Martha Palmer,et al.  Verb Semantics and Lexical Selection , 1994, ACL.

[4]  Olivier Bodenreider,et al.  Characterizing the definitions of anatomical concep ts in WordNet and specialized sources , 2002 .

[5]  Graeme Hirst,et al.  Evaluating WordNet-based Measures of Lexical Semantic Relatedness , 2006, CL.

[6]  Simone Paolo Ponzetto,et al.  An API for Measuring the Relatedness of Words in Wikipedia , 2007, ACL.

[7]  Roland Kuhn,et al.  Bilingual Sense Similarity for Statistical Machine Translation , 2010, ACL.

[8]  Hinrich Schütze,et al.  Automatic Word Sense Discrimination , 1998, Comput. Linguistics.

[9]  Terrence Adam,et al.  Semantic Similarity and Relatedness between Clinical Terms: An Experimental Study. , 2010, AMIA ... Annual Symposium proceedings. AMIA Symposium.

[10]  Ted Pedersen,et al.  Measures of semantic similarity and relatedness in the biomedical domain , 2007, J. Biomed. Informatics.

[11]  SchützeHinrich Automatic word sense discrimination , 1998 .

[12]  David W. Conrath,et al.  Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy , 1997, ROCLING/IJCLCLP.

[13]  Katrin Erk,et al.  A Simple, Similarity-based Model for Selectional Preferences , 2007, ACL.

[14]  Li Ning,et al.  Using Information Content to Evaluate Semantic Similarity on HowNet , 2012, CIS 2012.

[15]  Olivier Bodenreider,et al.  Comparing terms, concepts and semantic classes in WordNet and the Unified Medical Language System , 2001 .

[16]  Ted Pedersen,et al.  UMLS-Interface and UMLS-Similarity : Open Source Software for Measuring Paths and Semantic Similarity , 2009, AMIA.

[17]  Olivier Bodenreider,et al.  Using WordNet to Improve the Mapping of Data Elements to UMLS for Data Sources Integration , 2006, AMIA.

[18]  Hai Hu,et al.  Assessing semantic similarity measures for the characterization of human regulatory pathways , 2006, Bioinform..

[19]  Carole A. Goble,et al.  Investigating Semantic Similarity Measures Across the Gene Ontology: The Relationship Between Sequence and Annotation , 2003, Bioinform..

[20]  Michael E. Lesk,et al.  Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone , 1986, SIGDOC '86.

[21]  Ted Pedersen,et al.  Using WordNet-based Context Vectors to Estimate the Semantic Relatedness of Concepts , 2006 .

[22]  A. Venot,et al.  Appraisal of the MedDRA Conceptual Structure for Describing and Grouping Adverse Drug Reactions , 2005, Drug safety.

[23]  James Geller,et al.  Using WordNet synonym substitution to enhance UMLS source integration , 2009, Artif. Intell. Medicine.

[24]  Michael Pucher WordNet-based Semantic Relatedness Measures in Automatic Speech Recognition for Meetings , 2007, ACL.

[25]  Christiane Fellbaum,et al.  Combining Local Context and Wordnet Similarity for Word Sense Identification , 1998 .

[26]  Ted Pedersen,et al.  WordNet::Similarity - Measuring the Relatedness of Concepts , 2004, NAACL.

[27]  Deirdre Hogan,et al.  Empirical Measurements of Lexical Similarity in Noun Phrase Conjuncts , 2007, ACL.

[28]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[29]  Ted Pedersen,et al.  Extended Gloss Overlaps as a Measure of Semantic Relatedness , 2003, IJCAI.

[30]  Siddharth Patwardhan,et al.  Incorporating Dictionary and Corpus Information into a Context Vector Measure of Semantic Relatednes , 2003 .

[31]  Ying Liu,et al.  Using Second-order Vectors in a Knowledge-based Method for Acronym Disambiguation , 2011, CoNLL.

[32]  Graeme Hirst,et al.  Lexical chains as representations of context for the detection and correction of malapropisms , 1995 .

[33]  Roy Rada,et al.  Development and application of a metric on semantic nets , 1989, IEEE Trans. Syst. Man Cybern..

[34]  Olivier Bodenreider,et al.  The Unified Medical Language System (UMLS): integrating biomedical terminology , 2004, Nucleic Acids Res..