An Enhanced Lesk Word Sense Disambiguation Algorithm through a Distributional Semantic Model

This paper describes a new Word Sense Disambiguation (WSD) algorithm which extends two well-known variations of the Lesk WSD method. Given a word and its context, Lesk algorithm exploits the idea of maximum number of shared words (maximum overlaps) between the context of a word and each definition of its senses (gloss) in order to select the proper meaning. The main contribution of our approach relies on the use of a word similarity function defined on a distributional semantic space to compute the gloss-context overlap. As sense inventory we adopt BabelNet, a large multilingual semantic network built exploiting both WordNet and Wikipedia. Besides linguistic knowledge, BabelNet also represents encyclopedic concepts coming from Wikipedia. The evaluation performed on SemEval-2013 Multilingual Word Sense Disambiguation shows that our algorithm goes beyond the most frequent sense baseline and the simplified version of the Lesk algorithm. Moreover, when compared with the other participants in SemEval-2013 task, our approach is able to outperform the best system for English.

[1]  Louise Guthrie,et al.  Lexical Disambiguation using Simulated Annealing , 1992, COLING.

[2]  Philippe Langlais,et al.  Evaluating Variants of the Lesk Approach for Disambiguating Words , 2004, LREC.

[3]  Adam Kilgarriff,et al.  Framework and Results for English SENSEVAL , 2000, Comput. Humanit..

[4]  Ted Pedersen,et al.  An Adapted Lesk Algorithm for Word Sense Disambiguation Using WordNet , 2002, CICLing.

[5]  Roberto Navigli,et al.  SemEval-2013 Task 12: Multilingual Word Sense Disambiguation , 2013, *SEMEVAL.

[6]  Simone Paolo Ponzetto,et al.  BabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network , 2012, Artif. Intell..

[7]  Simone Paolo Ponzetto,et al.  Knowledge-Rich Word Sense Disambiguation Rivaling Supervised Systems , 2010, ACL.

[8]  Iryna Gurevych,et al.  Using Distributional Similarity for Lexical Expansion in Knowledge-based Word Sense Disambiguation , 2012, COLING.

[9]  T. Landauer,et al.  A Solution to Plato's Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge. , 1997 .

[10]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[11]  W. Lowe,et al.  Towards a Theory of Semantic Space , 2001 .

[12]  Mirella Lapata,et al.  Good Neighbors Make Good Senses: Exploiting Distributional Similarity for Unsupervised WSD , 2008, COLING.

[13]  Mirella Lapata,et al.  An Experimental Study of Graph Connectivity for Unsupervised Word Sense Disambiguation , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Carlo Strapparava,et al.  Domain Kernels for Word Sense Disambiguation , 2005, ACL.

[15]  Rafael Muñoz,et al.  UMCC_DLSI: Reinforcing a Ranking Algorithm with Sense Frequencies and Multidimensional Semantic Resources to solve Multilingual Word Sense Disambiguation , 2013, SemEval@NAACL-HLT.

[16]  Eneko Agirre,et al.  On the Use of Automatically Acquired Examples for All-Nouns Word Sense Disambiguation , 2008, J. Artif. Intell. Res..

[17]  Anna Lisa Gentile,et al.  UNIBA: JIGSAW algorithm for Word Sense Disambiguation , 2007, SemEval@ACL.

[18]  Eneko Agirre,et al.  Personalizing PageRank for Word Sense Disambiguation , 2009, EACL.

[19]  Tong Wang,et al.  Applying a Naive Bayes Similarity Measure to Word Sense Disambiguation , 2014, ACL.

[20]  Michael E. Lesk,et al.  Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone , 1986, SIGDOC '86.

[21]  Zellig S. Harris,et al.  Mathematical structures of language , 1968, Interscience tracts in pure and applied mathematics.

[22]  Raazesh Sainudiin,et al.  DAEBAK!: Peripheral Diversity for Multilingual Word Sense Disambiguation , 2013, SemEval@NAACL-HLT.

[23]  Caroline Sporleder,et al.  Topic Models for Word Sense Disambiguation and Token-Based Idiom Detection , 2010, ACL.

[24]  Didier Schwab,et al.  GETALP System : Propagation of a Lesk Measure through an Ant Colony Algorithm , 2013, SemEval@NAACL-HLT.