Research Paper: Automatic Resolution of Ambiguous Terms Based on Machine Learning and Conceptual Relations in the UMLS

UNLABELLED Motivation. The UMLS has been used in natural language processing applications such as information retrieval and information extraction systems. The mapping of free-text to UMLS concepts is important for these applications. To improve the mapping, we need a method to disambiguate terms that possess multiple UMLS concepts. In the general English domain, machine-learning techniques have been applied to sense-tagged corpora, in which senses (or concepts) of ambiguous terms have been annotated (mostly manually). Sense disambiguation classifiers are then derived to determine senses (or concepts) of those ambiguous terms automatically. However, manual annotation of a corpus is an expensive task. We propose an automatic method that constructs sense-tagged corpora for ambiguous terms in the UMLS using MEDLINE abstracts. METHODS For a term W that represents multiple UMLS concepts, a collection of MEDLINE abstracts that contain W is extracted. For each abstract in the collection, occurrences of concepts that have relations with W as defined in the UMLS are automatically identified. A sense-tagged corpus, in which senses of W are annotated, is then derived based on those identified concepts. The method was evaluated on a set of 35 frequently occurring ambiguous biomedical abbreviations using a gold standard set that was automatically derived. The quality of the derived sense-tagged corpus was measured using precision and recall. RESULTS The derived sense-tagged corpus had an overall precision of 92.9% and an overall recall of 47.4%. After removing rare senses and ignoring abbreviations with closely related senses, the overall precision was 96.8% and the overall recall was 50.6%. CONCLUSIONS UMLS conceptual relations and MEDLINE abstracts can be used to automatically acquire knowledge needed for resolving ambiguity when mapping free-text to UMLS concepts.

[1]  Hongfang Liu,et al.  Disambiguating Ambiguous Biomedical Terms in Biomedical Narrative Text: An Unsupervised Method , 2001, J. Biomed. Informatics.

[2]  Carol Friedman,et al.  A broad-coverage natural language processing system , 2000, AMIA.

[3]  C A Sneiderman,et al.  Finding the findings: identification of findings in medical literature using restricted natural language processing. , 1996, Proceedings : a conference of the American Medical Informatics Association. AMIA Fall Symposium.

[4]  David Yarowsky,et al.  One Sense Per Discourse , 1992, HLT.

[5]  Hwee Tou Ng,et al.  Integrating Multiple Knowledge Sources to Disambiguate Word Sense: An Exemplar-Based Approach , 1996, ACL.

[6]  Lluís Màrquez Villodre Machine learning and natural language processing , 2000 .

[7]  Robert Krovetz More than One Sense Per Discourse , 1998 .

[8]  S. Johnson A semantic lexicon for medical language processing. , 1999, Journal of the American Medical Informatics Association : JAMIA.

[9]  Claire Cardie,et al.  A Case-Based Approach to Knowledge Acquisition for Domain-Specific Sentence Analysis , 1993, AAAI.

[10]  Hongfang Liu,et al.  A study of abbreviations in the UMLS , 2001, AMIA.

[11]  Peter L. Elkin,et al.  UMLS Concept Indexing for Production Databases: A Feasibility Study , 2001, J. Am. Medical Informatics Assoc..

[12]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[13]  T C Rindflesch,et al.  Ambiguity resolution while mapping free text to the UMLS Metathesaurus. , 1994, Proceedings. Symposium on Computer Applications in Medical Care.

[14]  Elizabeth D. Liddy,et al.  Roget's International Thesaurus: Conceptual Issues and Potential Applications , 1990 .

[15]  Cynthia Brandt,et al.  Research Paper: UMLS Concept Indexing for Production Databases: A Feasibility Study , 2001, J. Am. Medical Informatics Assoc..

[16]  Eneko Agirre,et al.  Word Sense Disambiguation using Conceptual Density , 1996, COLING.

[17]  D. Lindberg,et al.  The Unified Medical Language System , 1993, Yearbook of Medical Informatics.

[18]  George Hripcsak,et al.  Natural language processing in an operational clinical information system , 1995, Natural Language Engineering.

[19]  Mark Sanderson,et al.  Word sense disambiguation and information retrieval , 1994, SIGIR '94.

[20]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[21]  Lawrence Hunter,et al.  Mining molecular binding terminology from biomedical text , 1999, AMIA.

[22]  Eneko Agirre,et al.  A Proposal for Word Sense Disambiguation using Conceptual Distance , 1995, ArXiv.

[23]  Hwee Tou Ng,et al.  Corpus-Based Approaches to Semantic Interpretation in NLP , 1997, AI Mag..

[24]  Hwee Tou Ng,et al.  Getting Serious about Word Sense Disambiguation , 2002 .

[25]  Hwee Tou Ng,et al.  Exemplar-Based Word Sense Disambiguation” Some Recent Improvements , 1997, EMNLP.

[26]  James H. Martin,et al.  Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition, 2nd Edition , 2000, Prentice Hall series in artificial intelligence.

[27]  D. Swanson Migraine and Magnesium: Eleven Neglected Connections , 2015, Perspectives in biology and medicine.

[28]  Lluis Marquez,et al.  Machine Learning and Natural Language Processing , 2000 .

[29]  Adam Kilgarriff,et al.  Gold standard datasets for evaluating word sense disambiguation programs , 1998, Comput. Speech Lang..

[30]  Peter Spyns Natural Language Processing in Medicine: An Overview , 1996, Methods of Information in Medicine.

[31]  Marc Weeber,et al.  Text-based discovery in biomedicine: the architecture of the DAD-system , 2000, AMIA.

[32]  Arnon Avron,et al.  The Value of the Four Values , 1998, Artif. Intell..

[33]  Michael Sussna,et al.  Word sense disambiguation for free-text indexing using a massive semantic network , 1993, CIKM '93.

[34]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[35]  Nancy Ide,et al.  Introduction to the Special Issue on Word Sense Disambiguation: The State of the Art , 1998, Comput. Linguistics.

[36]  Olivier Bodenreider,et al.  Circular hierarchical relationships in the UMLS: etiology, diagnosis, treatment, complications and prevention , 2001, AMIA.

[37]  Daniel E. Cooke,et al.  SequenceL Provides a Different Way to View Programming , 1998, Comput. Lang..

[38]  Marc Weeber,et al.  Developing a test collection for biomedical word sense disambiguation , 2001, AMIA.

[39]  Alan R. Aronson,et al.  Exploiting a Large Thesaurus for Information Retrieval , 1994, RIAO.

[40]  Hinrich Schütze,et al.  Automatic Word Sense Discrimination , 1998, Comput. Linguistics.

[41]  David Yarowsky,et al.  Unsupervised Word Sense Disambiguation Rivaling Supervised Methods , 1995, ACL.

[42]  Alan R. Aronson,et al.  Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program , 2001, AMIA.

[43]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.