Evaluating Semantic Relatedness and Similarity Measures with Standardized MedDRA Queries

A potential use of automated concept similarity and relatedness measures is to improve automatic detection of clinical text that relates to a condition indicative of an adverse drug reaction. This is also one of the purposes of the Medical Dictionary for Regulatory Activities (MedDRA) Standardized Queries (SMQ). An expert panel evaluates SMQs for their ability to detect a condition of interest and thus qualifies them as a reference standard for evaluating automated approaches. We compare similarity and relatedness measurement methods on rates of correctly identifying intra-category and inter-category concept pairs from SMQ data to create ROC curves of each method's sensitivity and specificity. Results indicate an information content measure, specifically the Resnik method, achieved the highest results as measured by area under the curve, but using two different measures as predictors, Resnik and Lin, obtained the highest score. Overall, using SMQ data resulted in a productive method of evaluating automated semantic relatedness and similarity scores.

[1]  Ted Pedersen,et al.  Towards a framework for developing semantic relatedness reference standards , 2011, J. Biomed. Informatics.

[2]  Ted Pedersen,et al.  UMLS-Interface and UMLS-Similarity : Open Source Software for Measuring Paths and Semantic Similarity , 2009, AMIA.

[3]  Roy Rada,et al.  Development and application of a metric on semantic nets , 1989, IEEE Trans. Syst. Man Cybern..

[4]  Terrence Adam,et al.  Semantic Similarity and Relatedness between Clinical Terms: An Experimental Study. , 2010, AMIA ... Annual Symposium proceedings. AMIA Symposium.

[5]  David Madigan,et al.  Influence of the MedDRA® hierarchy on pharmacovigilance data mining results , 2009, Int. J. Medical Informatics.

[6]  Xavier Robin,et al.  pROC: an open-source package for R and S+ to analyze and compare ROC curves , 2011, BMC Bioinformatics.

[7]  P. Mozzicato MedDRA , 2020, Pharmaceutical Medicine.

[8]  David W. Conrath,et al.  Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy , 1997, ROCLING/IJCLCLP.

[9]  Ted Pedersen,et al.  Extended Gloss Overlaps as a Measure of Semantic Relatedness , 2003, IJCAI.

[10]  Michael E. Lesk,et al.  Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone , 1986, SIGDOC '86.

[11]  Dekang Lin,et al.  An Information-Theoretic Definition of Similarity , 1998, ICML.

[12]  Bridget T. McInnes,et al.  Knowledge-based method for determining the meaning of ambiguous biomedical terms using information content measures of similarity. , 2011, AMIA ... Annual Symposium proceedings. AMIA Symposium.

[13]  Ted Pedersen,et al.  Semantic relatedness study using second order co-occurrence vectors computed from biomedical corpora, UMLS and WordNet , 2012, IHI '12.

[14]  Philip Resnik,et al.  Using Information Content to Evaluate Semantic Similarity in a Taxonomy , 1995, IJCAI.

[15]  Martin Chodorow,et al.  Combining local context and wordnet similarity for word sense identification , 1998 .

[16]  Christiane Fellbaum,et al.  Combining Local Context and Wordnet Similarity for Word Sense Identification , 1998 .

[17]  Ted Pedersen,et al.  Using WordNet-based Context Vectors to Estimate the Semantic Relatedness of Concepts , 2006 .

[18]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .