Acquiring Sense Tagged Examples using Relevance Feedback

Supervised approaches to Word Sense Disambiguation (WSD) have been shown to outperform other approaches but are hampered by reliance on labeled training examples (the data acquisition bottleneck). This paper presents a novel approach to the automatic acquisition of labeled examples for WSD which makes use of the Information Retrieval technique of relevance feedback. This semi-supervised method generates additional labeled examples based on existing annotated data. Our approach is applied to a set of ambiguous terms from biomedical journal articles and found to significantly improve the performance of a state-of-the-art WSD system.

[1]  Eneko Agirre,et al.  Word Sense Disambiguation: Algorithms and Applications (Text, Speech and Language Technology) , 2006 .

[2]  Paul Buitelaar,et al.  Unsupervised Monolingual and Bilingual Word-Sense Disambiguation of Medical Documents using UMLS , 2003, BioNLP@ACL.

[3]  Ted Pedersen,et al.  A Decision Tree of Bigrams is an Accurate Predictor of Word Sense , 2001, NAACL.

[4]  Stephen E. Robertson,et al.  Relevance weighting of search terms , 1976, J. Am. Soc. Inf. Sci..

[5]  Gerard Salton,et al.  The SMART Retrieval System—Experiments in Automatic Document Processing , 1971 .

[6]  KilicogluHalil,et al.  Word sense disambiguation by selecting the best semantic type based on Journal Descriptor Indexing , 2006 .

[7]  Hongfang Liu,et al.  Research Paper: Automatic Resolution of Ambiguous Terms Based on Machine Learning and Conceptual Relations in the UMLS , 2002, J. Am. Medical Informatics Assoc..

[8]  Hwee Tou Ng,et al.  Exploiting Parallel Texts for Word Sense Disambiguation: An Empirical Study , 2003, ACL.

[9]  Adam Kilgarriff,et al.  The Senseval-3 English lexical sample task , 2004, SENSEVAL@ACL.

[10]  Eneko Agirre,et al.  Word Sense Disambiguation: Algorithms and Applications , 2007 .

[11]  Eneko Agirre,et al.  Unsupervised WSD based on Automatically Retrieved Examples: The Importance of Bias , 2004, EMNLP.

[12]  Michael E. Lesk,et al.  Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone , 1986, SIGDOC '86.

[13]  Eneko Agirre,et al.  The Basque Country University system: English and Basque tasks , 2004, SENSEVAL@ACL.

[14]  George A. Miller,et al.  Using Corpus Statistics and WordNet Relations for Sense Identification , 1998, CL.

[15]  J. J. Rocchio,et al.  Relevance feedback in information retrieval , 1971 .

[16]  Eneko Agirre,et al.  Proceedings of the 4th International Workshop on Semantic Evaluations , 2007 .

[17]  Hwee Tou Ng,et al.  Getting Serious about Word Sense Disambiguation , 2002 .

[18]  Ted Pedersen,et al.  A Comparative Study of Support Vector Machines Applied to the Supervised Word Sense Disambiguation Problem in the Medical Domain , 2005, IICAI.

[19]  Julie Weeds,et al.  Finding Predominant Word Senses in Untagged Text , 2004, ACL.

[20]  Ted Pedersen,et al.  Using UMLS Concept Unique Identifiers (CUIs) for Word Sense Disambiguation in the Biomedical Domain , 2007, AMIA.

[21]  Thomas C. Rindflesch,et al.  Effects of information and machine learning algorithms on word sense disambiguation with small datasets , 2005, Int. J. Medical Informatics.

[22]  Mark Stevenson,et al.  Knowledge Sources for Word Sense Disambiguation of Biomedical Text , 2008, BioNLP.

[23]  Halil Kilicoglu,et al.  Word sense disambiguation by selecting the best semantic type based on Journal Descriptor Indexing: Preliminary experiment , 2006, J. Assoc. Inf. Sci. Technol..

[24]  Marc Weeber,et al.  Developing a test collection for biomedical word sense disambiguation , 2001, AMIA.

[25]  Hongfang Liu,et al.  Research Paper: A Multi-aspect Comparison Study of Supervised Word Sense Disambiguation , 2004, J. Am. Medical Informatics Assoc..

[26]  Betsy L. Humphreys,et al.  Technical Milestone: The Unified Medical Language System: An Informatics Research Collaboration , 1998, J. Am. Medical Informatics Assoc..