Scaling up WSD with Automatically Generated Examples

The most accurate approaches to Word Sense Disambiguation (WSD) for biomedical documents are based on supervised learning. However, these require manually labeled training examples which are expensive to create and consequently supervised WSD systems are normally limited to disambiguating a small set of ambiguous terms. An alternative approach is to create labeled training examples automatically and use them as a substitute for manually labeled ones. This paper describes a large scale WSD system based on automatically labeled examples generated using information from the UMLS Metathesaurus. The labeled examples are generated without any use of labeled training data whatsoever and is therefore completely unsupervised (unlike some previous approaches). The system is evaluated on two widely used data sets and found to outperform a state-of-the-art unsupervised approach which also uses information from the UMLS Metathesaurus.

[1]  Marc Weeber,et al.  Developing a test collection for biomedical word sense disambiguation , 2001, AMIA.

[2]  Christopher G. Chute,et al.  Word sense disambiguation across two domains: Biomedical literature and clinical notes , 2008, J. Biomed. Informatics.

[3]  Julie Weeds,et al.  Finding Predominant Word Senses in Untagged Text , 2004, ACL.

[4]  Antonio Jimeno-Yepes,et al.  Knowledge-based biomedical word sense disambiguation: comparison of approaches , 2010, BMC Bioinformatics.

[5]  Eneko Agirre,et al.  Unsupervised WSD based on Automatically Retrieved Examples: The Importance of Bias , 2004, EMNLP.

[6]  Halil Kilicoglu,et al.  Word sense disambiguation by selecting the best semantic type based on Journal Descriptor Indexing: Preliminary experiment , 2006, J. Assoc. Inf. Sci. Technol..

[7]  Ron Artstein,et al.  Survey Article: Inter-Coder Agreement for Computational Linguistics , 2008, CL.

[8]  Ted Pedersen,et al.  Using UMLS Concept Unique Identifiers (CUIs) for Word Sense Disambiguation in the Biomedical Domain , 2007, AMIA.

[9]  Sergey Brin,et al.  Extracting Patterns and Relations from the World Wide Web , 1998, WebDB.

[10]  Mark Stevenson,et al.  Disambiguation of ambiguous biomedical terms using examples generated from the UMLS Metathesaurus , 2010, J. Biomed. Informatics.

[11]  Huaiyu Zhu On Information and Sufficiency , 1997 .

[12]  Mark Stevenson,et al.  Disambiguation of biomedical text using diverse sources of information , 2008, BMC Bioinformatics.

[13]  Eneko Agirre,et al.  Graph-based Word Sense Disambiguation of biomedical documents , 2010, Bioinform..

[14]  Hongfang Liu,et al.  Research Paper: Automatic Resolution of Ambiguous Terms Based on Machine Learning and Conceptual Relations in the UMLS , 2002, J. Am. Medical Informatics Assoc..

[15]  George Hripcsak,et al.  Gene symbol disambiguation using knowledge-based profiles , 2007, Bioinform..

[16]  Raymond J. Mooney,et al.  Comparative Experiments on Disambiguating Word Senses: An Illustration of the Role of Bias in Machine Learning , 1996, EMNLP.

[17]  Johanna I. Westbrook,et al.  Research Paper: Do Online Information Retrieval Systems Help Experienced Clinicians Answer Clinical Questions? , 2005, J. Am. Medical Informatics Assoc..

[18]  George A. Miller,et al.  Using Corpus Statistics and WordNet Relations for Sense Identification , 1998, CL.

[19]  Bridget T. McInnes,et al.  Exploiting MeSH indexing in MEDLINE to generate a data set for word sense disambiguation , 2011, BMC Bioinformatics.

[20]  A. Valencia,et al.  Text-mining and information-retrieval services for molecular biology , 2005, Genome Biology.

[21]  Thomas C. Rindflesch,et al.  Effects of information and machine learning algorithms on word sense disambiguation with small datasets , 2005, Int. J. Medical Informatics.

[22]  Antonio Jimeno-Yepes,et al.  Self-training and co-training in biomedical word sense disambiguation , 2011, BioNLP@ACL.

[23]  Ted Pedersen,et al.  A Comparative Study of Support Vector Machines Applied to the Supervised Word Sense Disambiguation Problem in the Medical Domain , 2005, IICAI.

[24]  KilicogluHalil,et al.  Word sense disambiguation by selecting the best semantic type based on Journal Descriptor Indexing , 2006 .

[25]  S. T. Buckland,et al.  Computer-Intensive Methods for Testing Hypotheses. , 1990 .

[26]  Eneko Agirre,et al.  The Basque Country University system: English and Basque tasks , 2004, SENSEVAL@ACL.