Simple Window Selection Strategies for the Simplified Lesk Algorithm for Word Sense Disambiguation

The Simplified Lesk Algorithm (SLA) is frequently used for word sense disambiguation. It disambiguates by calculating the overlap of a set of dictionary definitions (senses) and the context words. The algorithm is simple and fast, but it has relatively low accuracy. We propose simple strategies for the context window selection that improve the performance of the SLA: (1) constructing the window only with words that have an overlap with some sense of the target word, (2) excluding the target word itself from matching, and (3) avoiding repetitions in the context window. This paper describes the corresponding experiments. Comparison with other more complex knowledge-based algorithms is presented.

[1]  Ted Pedersen,et al.  Using Measures of Semantic Relatedness for Word Sense Disambiguation , 2003, CICLing.

[2]  Hwee Tou Ng,et al.  Word Sense Disambiguation Improves Statistical Machine Translation , 2007, ACL.

[3]  Martha Palmer,et al.  The English all-words task , 2004, SENSEVAL@ACL.

[4]  Darnes Vilariño Ayala,et al.  Evaluating n-grams Models for the Bilingual Word Sense Disambiguation Task , 2011, Computación y Sistemas.

[5]  Ted Pedersen,et al.  An Adapted Lesk Algorithm for Word Sense Disambiguation Using WordNet , 2002, CICLing.

[6]  Rada Mihalcea,et al.  Unsupervised Graph-basedWord Sense Disambiguation Using Measures of Word Semantic Similarity , 2007 .

[7]  Felix Naumann,et al.  Data fusion , 2009, CSUR.

[8]  Rada Mihalcea,et al.  Knowledge-Based Methods for WSD , 2007 .

[9]  Eneko Agirre,et al.  Word Sense Disambiguation: Algorithms and Applications (Text, Speech and Language Technology) , 2006 .

[10]  Christopher D. Manning,et al.  Enriching the Knowledge Sources Used in a Maximum Entropy Part-of-Speech Tagger , 2000, EMNLP.

[11]  Adam Kilgarriff,et al.  Framework and Results for English SENSEVAL , 2000, Comput. Humanit..

[12]  Julie Weeds,et al.  Finding Predominant Word Senses in Untagged Text , 2004, ACL.

[13]  Peter Z. Yeh,et al.  Semantic Interpretation of the Web without the Semantic Web: Toward Business-Aware Web Processors , 2007 .

[14]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[15]  Philippe Langlais,et al.  Evaluating Variants of the Lesk Approach for Disambiguating Words , 2004, LREC.

[16]  Mirella Lapata,et al.  An Experimental Study of Graph Connectivity for Unsupervised Word Sense Disambiguation , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Marine Carpuat,et al.  Toward integrating word sense and entity disambiguation into statistical machine translation , 2006, IWSLT.

[18]  Rada Mihalcea,et al.  Wikify!: linking documents to encyclopedic knowledge , 2007, CIKM '07.

[19]  Michael E. Lesk,et al.  Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone , 1986, SIGDOC '86.

[20]  Roberto Navigli,et al.  Word sense disambiguation: A survey , 2009, CSUR.

[21]  Alexander Gelbukh,et al.  Computational Linguistics and Intelligent Text Processing , 2015, Lecture Notes in Computer Science.

[22]  Christiane Fellbaum,et al.  English Tasks: All-Words and Verb Lexical Sample , 2001, *SEMEVAL.