Disambiguating Ambiguous Biomedical Terms in Biomedical Narrative Text: An Unsupervised Method

With the growing use of Natural Language Processing (NLP) techniques for information extraction and concept indexing in the biomedical domain, a method that quickly and efficiently assigns the correct sense of an ambiguous biomedical term in a given context is needed concurrently. The current status of word sense disambiguation (WSD) in the biomedical domain is that handcrafted rules are used based on contextual material. The disadvantages of this approach are (i) generating WSD rules manually is a time-consuming and tedious task, (ii) maintenance of rule sets becomes increasingly difficult over time, and (iii) handcrafted rules are often incomplete and perform poorly in new domains comprised of specialized vocabularies and different genres of text. This paper presents a two-phase unsupervised method to build a WSD classifier for an ambiguous biomedical term W. The first phase automatically creates a sense-tagged corpus for W, and the second phase derives a classifier for W using the derived sense-tagged corpus as a training set. A formative experiment was performed, which demonstrated that classifiers trained on the derived sense-tagged corpora achieved an overall accuracy of about 97%, with greater than 90% accuracy for each individual ambiguous term.

[1]  Hwee Tou Ng,et al.  Integrating Multiple Knowledge Sources to Disambiguate Word Sense: An Exemplar-Based Approach , 1996, ACL.

[2]  Lucila Ohno-Machado,et al.  A Comparison of Machine Learning Methods for the Diagnosis of Pigmented Skin Lesions , 2001, J. Biomed. Informatics.

[3]  Carol Friedman,et al.  Research Paper: A General Natural-language Text Processor for Clinical Radiology , 1994, J. Am. Medical Informatics Assoc..

[4]  Carol Friedman,et al.  A broad-coverage natural language processing system , 2000, AMIA.

[5]  W. DuMouchel,et al.  Unlocking Clinical Data from Narrative Reports: A Study of Natural Language Processing , 1995, Annals of Internal Medicine.

[6]  Janyce Wiebe,et al.  Word-Sense Disambiguation Using Decomposable Models , 1994, ACL.

[7]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[8]  David Yarowsky,et al.  DECISION LISTS FOR LEXICAL AMBIGUITY RESOLUTION: Application to Accent Restoration in Spanish and French , 1994, ACL.

[9]  Hwee Tou Ng,et al.  Exemplar-Based Word Sense Disambiguation” Some Recent Improvements , 1997, EMNLP.

[10]  Basehoar Ad Extracting knowledge from dynamics in gene expression. , 2001 .

[11]  N L Jain,et al.  Respiratory Isolation of Tuberculosis Patients Using Clinical Guidelines and an Automated Clinical Decision Support System , 1998, Infection Control & Hospital Epidemiology.

[12]  George A. Miller,et al.  Using Corpus Statistics and WordNet Relations for Sense Identification , 1998, CL.

[13]  Nancy Ide,et al.  Introduction to the Special Issue on Word Sense Disambiguation: The State of the Art , 1998, Comput. Linguistics.

[14]  Hongfang Liu,et al.  Evaluating the UMLS as a source of lexical knowledge for medical language processing , 2001, AMIA.

[15]  Claire Cardie,et al.  A Case-Based Approach to Knowledge Acquisition for Domain-Specific Sentence Analysis , 1993, AAAI.

[16]  William T. Hole,et al.  Managing Name Ambiguity in the UMLS Metathesaurus , 2000, AMIA.

[17]  Yorick Wilks,et al.  Combining Weak Knowledge Sources for Sense Disambiguation , 1999, IJCAI.

[18]  Adam Kilgarriff,et al.  Framework and Results for English SENSEVAL , 2000, Comput. Humanit..

[19]  Ellen M. Voorhees,et al.  Corpus-Based Statistical Sense Resolution , 1993, HLT.

[20]  Raymond J. Mooney,et al.  Comparative Experiments on Disambiguating Word Senses: An Illustration of the Role of Bias in Machine Learning , 1996, EMNLP.

[21]  Edward H. Shortliffe,et al.  The application of certainty factors to neural computing for rule discovery , 2000, IEEE Trans. Neural Networks Learn. Syst..

[22]  David A. Campbell,et al.  Comparing syntactic complexity in medical and non-medical corpora , 2001, AMIA.

[23]  Hwee Tou Ng,et al.  Corpus-Based Approaches to Semantic Interpretation in Natural Language Processing , 1997 .

[24]  Lluís Màrquez i Villodre,et al.  Naive Bayes and Exemplar-based Approaches to Word Sense Disambiguation Revisited , 2000, ECAI.

[25]  B. A. Engel,et al.  INTEGRATING MULTIPLE KNOWLEDGE SOURCES , 1990 .

[26]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[27]  S. Johnson A semantic lexicon for medical language processing. , 1999, Journal of the American Medical Informatics Association : JAMIA.

[28]  Peter L. Elkin,et al.  UMLS Concept Indexing for Production Databases: A Feasibility Study , 2001, J. Am. Medical Informatics Assoc..

[29]  D. Bloom,et al.  Acronyms, abbreviations and initialisms , 2000, BJU international.

[30]  T O Cheng,et al.  Acronyms of clinical trials in cardiology--1998. , 1999, American heart journal.

[31]  Hongfang Liu,et al.  A study of abbreviations in the UMLS , 2001, AMIA.

[32]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[33]  T C Rindflesch,et al.  Ambiguity resolution while mapping free text to the UMLS Metathesaurus. , 1994, Proceedings. Symposium on Computer Applications in Medical Care.

[34]  Alan R. Aronson,et al.  Exploiting a Large Thesaurus for Information Retrieval , 1994, RIAO.

[35]  Cynthia Brandt,et al.  Research Paper: UMLS Concept Indexing for Production Databases: A Feasibility Study , 2001, J. Am. Medical Informatics Assoc..

[36]  G Hripcsak,et al.  Evaluating Natural Language Processors in the Clinical Domain , 1998, Methods of Information in Medicine.

[37]  Ronald L. Rivest,et al.  Learning decision lists , 2004, Machine Learning.