Exploiting domain information for Word Sense Disambiguation of medical documents

Objective Current techniques for knowledge-based Word Sense Disambiguation (WSD) of ambiguous biomedical terms rely on relations in the Unified Medical Language System Metathesaurus but do not take into account the domain of the target documents. The authors' goal is to improve these methods by using information about the topic of the document in which the ambiguous term appears. Design The authors proposed and implemented several methods to extract lists of key terms associated with Medical Subject Heading terms. These key terms are used to represent the document topic in a knowledge-based WSD system. They are applied both alone and in combination with local context. Measurements A standard measure of accuracy was calculated over the set of target words in the widely used National Library of Medicine WSD dataset. Results and discussion The authors report a significant improvement when combining those key terms with local context, showing that domain information improves the results of a WSD system based on the Unified Medical Language System Metathesaurus alone. The best results were obtained using key terms obtained by relevance feedback and weighted by inverse document frequency.

[1]  Carolyn M. Hall,et al.  Encyclopedia of Library and Information Science , 1971 .

[2]  J. J. Rocchio,et al.  Relevance feedback in information retrieval , 1971 .

[3]  Gerard Salton,et al.  The SMART Retrieval System—Experiments in Automatic Document Processing , 1971 .

[4]  Michael E. Lesk,et al.  Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone , 1986, SIGDOC '86.

[5]  S. T. Buckland,et al.  Computer-Intensive Methods for Testing Hypotheses. , 1990 .

[6]  Ted Dunning,et al.  Accurate Methods for the Statistics of Surprise and Coincidence , 1993, CL.

[7]  D. Lindberg,et al.  The Unified Medical Language System , 1993, Methods of Information in Medicine.

[8]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[9]  Daniel E. Cooke,et al.  SequenceL Provides a Different Way to View Programming , 1998, Comput. Lang..

[10]  Nancy Ide,et al.  Introduction to the Special Issue on Word Sense Disambiguation: The State of the Art , 1998, Comput. Linguistics.

[11]  Susanne M. Humphrey,et al.  Automatic Indexing of Documents from Journal Descriptors: A Preliminary Investigation , 1999, J. Am. Soc. Inf. Sci..

[12]  Paul Rayson,et al.  Comparing Corpora using Frequency Profiling , 2000, Proceedings of the workshop on Comparing corpora -.

[13]  Marc Weeber,et al.  Developing a test collection for biomedical word sense disambiguation , 2001, AMIA.

[14]  Ted Pedersen,et al.  A Decision Tree of Bigrams is an Accurate Predictor of Word Sense , 2001, NAACL.

[15]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[16]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[17]  Taher H. Haveliwala Topic-sensitive PageRank , 2002, IEEE Trans. Knowl. Data Eng..

[18]  Hongfang Liu,et al.  Research Paper: A Multi-aspect Comparison Study of Supervised Word Sense Disambiguation , 2004, J. Am. Medical Informatics Assoc..

[19]  Johanna I. Westbrook,et al.  Do online information retrieval systems help experienced clinicians answer clinical questions? , 2005, Journal of the American Medical Informatics Association : JAMIA.

[20]  Hagit Shatkay,et al.  Hairpins in bookstacks: Information retrieval from biomedical text , 2005, Briefings Bioinform..

[21]  Ted Pedersen,et al.  A Comparative Study of Support Vector Machines Applied to the Supervised Word Sense Disambiguation Problem in the Medical Domain , 2005, IICAI.

[22]  A. Valencia,et al.  Text-mining and information-retrieval services for molecular biology , 2005, Genome Biology.

[23]  Diana McCarthy,et al.  Domain-Speci(cid:12)c Sense Distributions and Predominant Sense Acquisition , 2022 .

[24]  Paul Buitelaar,et al.  Domain-Specific Word Sense Disambiguation , 2006 .

[25]  Hong Yu,et al.  A large scale, corpus-based approach for automatically disambiguating biomedical abbreviations , 2006, TOIS.

[26]  Halil Kilicoglu,et al.  Word sense disambiguation by selecting the best semantic type based on Journal Descriptor Indexing: Preliminary experiment , 2006, J. Assoc. Inf. Sci. Technol..

[27]  Mirella Lapata,et al.  Graph Connectivity Measures for Unsupervised Word Sense Disambiguation , 2007, IJCAI.

[28]  Angus Roberts,et al.  The CLEF Corpus: Semantic Annotation of Clinical Text , 2007, AMIA.

[29]  George Hripcsak,et al.  Gene symbol disambiguation using knowledge-based profiles , 2007, Bioinform..

[30]  Mark Stevenson,et al.  Disambiguation of biomedical text using diverse sources of information , 2008, BMC Bioinformatics.

[31]  Christopher G. Chute,et al.  Word sense disambiguation across two domains: Biomedical literature and clinical notes , 2008, J. Biomed. Informatics.

[32]  E. Coiera,et al.  Impact of Web Searching and Social Feedback on Consumer Decision Making: A Prospective Online Experiment , 2008, Journal of medical Internet research.

[33]  Mark Stevenson,et al.  Acquiring Sense Tagged Examples using Relevance Feedback , 2008, COLING.

[34]  Ron Artstein,et al.  Survey Article: Inter-Coder Agreement for Computational Linguistics , 2008, CL.

[35]  M. Hepple,et al.  Semantic Annotation of Clinical Text : The CLEF Corpus , 2008 .

[36]  Bridget T. McInnes An Unsupervised Vector Approach to Biomedical Term Disambiguation: Integrating UMLS and Medline , 2008, ACL.

[37]  Oier Lopez de Lacalle,et al.  Knowledge-Based WSD and Specific Domains: Performing Better than Generic Supervised WSD , 2009, IJCAI.

[38]  Ioannis Korkontzelos,et al.  Detecting Compositionality in Multi-Word Expressions , 2009, ACL/IJCNLP.

[39]  Eneko Agirre,et al.  Personalizing PageRank for Word Sense Disambiguation , 2009, EACL.

[40]  Felix Naumann,et al.  Data fusion , 2009, CSUR.

[41]  Roberto Navigli,et al.  Word sense disambiguation: A survey , 2009, CSUR.

[42]  Mark Stevenson,et al.  Disambiguation of ambiguous biomedical terms using examples generated from the UMLS Metathesaurus , 2010, J. Biomed. Informatics.

[43]  Mark Stevenson,et al.  Disambiguation in the biomedical domain: The role of ambiguity type , 2010, J. Biomed. Informatics.

[44]  Pushpak Bhattacharyya,et al.  All Words Domain Adapted WSD: Finding a Middle Ground between Supervision and Unsupervision , 2010, ACL.

[45]  Adam Kilgarriff,et al.  A Corpus Factory for Many Languages , 2010, LREC.

[46]  Carlo Strapparava,et al.  Proceedings of the 5th International Workshop on Semantic Evaluation , 2010 .

[47]  Eneko Agirre,et al.  Graph-based Word Sense Disambiguation of biomedical documents , 2010, Bioinform..

[48]  Alan R. Aronson,et al.  An overview of MetaMap: historical perspective and recent advances , 2010, J. Am. Medical Informatics Assoc..

[49]  Antonio Jimeno-Yepes,et al.  Knowledge-based biomedical word sense disambiguation: comparison of approaches , 2010, BMC Bioinformatics.

[50]  Antonio Jimeno-Yepes,et al.  Self-training and co-training in biomedical word sense disambiguation , 2011, BioNLP@ACL.

[51]  S. Haque Ethics approval This study was conducted with the approval of the East London and City Health Authority Ethic Committee. Provenance and peer review Not commissioned; externally peer reviewed. , 2011 .

[52]  Sophia Ananiadou,et al.  Proceedings of BioNLP 2011 Workshop , 2011 .