Towards clustering-based word sense discrimination

This paper describes a series of experiments conducted to group similar words using context features derived from a corpus. The goal is to find an approach that would be suitable for cleaning the fuzzy WordNet synsets obtained by automatic translation of Serbian synsets into Slovene. Similar techniques have been used successfully by a number of researches already and they are attractive particularly because they are knowledge-lean and based on evidence found in simple raw text. A selection of features and settings are tested on sample test sets with an unsupervised machine learning method called hierarchical clustering. In the final part of the paper, the obtained results are analyzed and the optimal set of features is selected, followed by a discussion of the results and some further

[1]  Zellig S. Harris,et al.  Mathematical structures of language , 1968, Interscience tracts in pure and applied mathematics.

[2]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[3]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[4]  G. Miller,et al.  Contextual correlates of semantic similarity , 1991 .

[5]  Robert L. Mercer,et al.  Word-Sense Disambiguation Using Statistical Methods , 1991, ACL.

[6]  Kenneth Ward Church,et al.  Using bilingual materials to develop word sense disambiguation methods , 1992, TMI.

[7]  Ted Pedersen,et al.  Knowledge Lean Word-Sense Disambiguation , 1997, AAAI/IAAI.

[8]  Adam Kilgarriff,et al.  "I Don’t Believe in Word Senses" , 1997, Comput. Humanit..

[9]  Hinrich Schütze,et al.  Automatic Word Sense Discrimination , 1998, Comput. Linguistics.

[10]  Piek Vossen,et al.  EuroWordNet: A multilingual database with lexical semantic networks , 1998, Springer Netherlands.

[11]  Curt Burgess,et al.  The Dynamics of Meaning in Memory , 1998 .

[12]  Dekang Lin,et al.  Automatic Retrieval and Clustering of Similar Words , 1998, ACL.

[13]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[14]  Patrick Hanks,et al.  Do Word Meanings Exist? , 2000, Comput. Humanit..

[15]  Nancy Ide,et al.  Sense Discrimination with Parallel Corpora , 2002, SENSEVAL.

[16]  Patrick Pantel,et al.  Concept Discovery from Text , 2002, COLING.

[17]  Amruta Purandare Discriminating Among Word Senses Using McQuitty's Similarity Analysis , 2003, HLT-NAACL.

[18]  Paul Buitelaar,et al.  Evaluating Context Features for Medical Relation Mining , 2003 .

[19]  Ted Pedersen,et al.  Word Sense Discrimination by Clustering Contexts in Vector and Similarity Spaces , 2004, CoNLL.

[20]  Karen Spärck Jones A statistical interpretation of term specificity and its application in retrieval , 2021, J. Documentation.

[21]  Stephen E. Robertson,et al.  Understanding inverse document frequency: on theoretical arguments for IDF , 2004, J. Documentation.

[22]  Tomaz Erjavec,et al.  Building Slovene WordNet , 2006, LREC.

[23]  Eneko Agirre,et al.  Word Sense Disambiguation: Algorithms and Applications , 2007 .