Increasing UMLS Coverage and Reducing Ambiguity via Automated Creation of Synonymous Terms

Background: Although extensive synonymy is one of the greatest strengths of the UMLS Metathesaurus, much research has nonetheless focused on identifying and measuring gaps in UMLS synonymy. This paper proposes a methodology for further extending the UMLS’ already rich synonymy by semi-automatically creating new strings not in the UMLS, and including them as additional synonymous strings within existing UMLS concepts. Results: In this paper we present our methodology for identifying missing UMLS synonymy and semi-automatically creating synonyms to fill these gaps. We created an enhanced Metathesaurus supplemented by these strings, and improved the performance on both biomedical literature and clinical text of two well known named-entity-recognition applications at the US National Library of Medicine, MetaMap and the Medical Text Indexer (MTI). Conclusions: Our methods propose first steps toward extending the already rich synonymy of the UMLS by filling in some synonymy gaps. We further theorize that some of the newly created strings could also be used to extend the Medical Subject Headings (MeSH) entry terms, and thereby enhance MEDLINE indexing and PubMed queries by better reflecting how authors actually refer to biomedical concepts in the literature.

[1]  Mike Conway,et al.  Identifying Synonymy between SNOMED Clinical Terms of Varying Length Using Distributional Analysis of Electronic Health Records , 2013, AMIA.

[2]  Alan R. Aronson,et al.  An overview of MetaMap: historical perspective and recent advances , 2010, J. Am. Medical Informatics Assoc..

[3]  Natalia Grabar,et al.  Combination of endogenous clues for profiling inferred semantic relations: experiments with Gene Ontology , 2008, AMIA.

[4]  Ted Pedersen,et al.  UMLS-Interface and UMLS-Similarity : Open Source Software for Measuring Paths and Semantic Similarity , 2009, AMIA.

[5]  Antonio Jimeno-Yepes,et al.  The NLM Medical Text Indexer System for Indexing Biomedical Literature , 2013, BioASQ@CLEF.

[6]  Thierry Hamon,et al.  A Step towards the Detection of Semantic Variants of Terms in Technical Documents , 1998, COLING-ACL.

[7]  Olivier Bodenreider,et al.  Beyond synonymy: exploiting the UMLS semantics in mapping vocabularies , 1998, AMIA.

[8]  Andrey Rzhetsky,et al.  Quantifying the Impact and Extent of Undocumented Biomedical Synonymy , 2014, PLoS Comput. Biol..

[9]  Dina Demner-Fushman,et al.  Recent Enhancements to the NLM Medical Text Indexer , 2014, CLEF.

[10]  James Geller,et al.  Using WordNet synonym substitution to enhance UMLS source integration , 2009, Artif. Intell. Medicine.

[11]  Ted Pedersen,et al.  Towards a framework for developing semantic relatedness reference standards , 2011, J. Biomed. Informatics.

[12]  Christian Jacquemin,et al.  Automatic Acquisition and Expansion of Hypernym Links , 2004, Comput. Humanit..

[13]  S. Griffis EDITOR , 1997, Journal of Navigation.

[14]  Natalia Grabar,et al.  Automatic Acquisition of Synonym Resources and Assessment of their Impact on the Enhanced Search in EHRs , 2009, Methods of Information in Medicine.

[15]  Nigel Collier,et al.  Synonym set extraction from the biomedical literature by lexical pattern discovery , 2007, BMC Bioinformatics.

[16]  Sophia Ananiadou,et al.  Normalizing biomedical terms by minimizing ambiguity and variability , 2008, BMC Bioinformatics.

[17]  Ted Pedersen,et al.  Measures of semantic similarity and relatedness in the biomedical domain , 2007, J. Biomed. Informatics.

[18]  James Geller,et al.  Piecewise Synonyms for Enhanced UMLS Source Terminology Integration , 2007, AMIA.

[19]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[20]  Terrence Adam,et al.  Semantic Similarity and Relatedness between Clinical Terms: An Experimental Study. , 2010, AMIA ... Annual Symposium proceedings. AMIA Symposium.

[21]  William T. Hole,et al.  Discovering missed synonymy in a large concept-oriented Metathesaurus , 2000, AMIA.

[22]  Christian Lovis,et al.  Automatic Extraction of Linguistic Knowledge from an International Classification , 1998, MedInfo.

[23]  Evelyne Tzoukermann,et al.  Expansion of multi-word terms for indexing and retrieval using morphology and syntax , 1997 .

[24]  Marti A. Hearst Automatic Acquisition of Hyponyms from Large Text Corpora , 1992, COLING.

[25]  Christian Jacquemin,et al.  Syntagmatic and Paradigmatic Representations of Term Variation , 1999, ACL.