论文信息 - Unsupervised Word Sense Induction from Multiple Semantic Spaces with Locality Sensitive Hashing - 字舞流文

Unsupervised Word Sense Induction from Multiple Semantic Spaces with Locality Sensitive Hashing

Word Sense Disambiguation is the task dedicated to the problem of finding out the sense of a word in context, from all of its many possible senses. Solving this problem requires to know the set of possible senses for a given word, which can be acquired from human knowledge, or from automatic discovery, called Word Sense Induction. In this article, we adapt two existing meta-methods of Word Sense Induction for the automatic construction of a disambiguation lexicon. Our adaptation is based on multiple semantic spaces (also called Word Space Models) produced from a syntactic analysis of a very large number of web pages. These adaptations and the results presented in this article dier from the original methods in that they use a combination of several high dimensional spaces instead of one single representation. Each of these competing semantic spaces takes part in a clustering phase in which they vote on sense induction.

Guillaume Pitel | Gaël de Chalendar | Anne Vilnat | Claire Mouton

[1] Zellig S. Harris,et al. Distributional Structure , 1954 .

[2] Patrick Pantel,et al. Discovering word senses from text , 2002, KDD.

[3] Roberto Navigli,et al. Word sense disambiguation: A survey , 2009, CSUR.

[4] Moses Charikar,et al. Similarity estimation techniques from rounding algorithms , 2002, STOC '02.

[5] Vipin Kumar,et al. Finding Topics in Collections of Documents: A Shared Nearest Neighbor Approach , 2003, Clustering and Information Retrieval.

[6] Gerard Salton,et al. A vector space model for automatic indexing , 1975, CACM.

[7] Gregory Grefenstette,et al. Conquering Language: Using NLP on a Massive Scale to Build High Dimensional Language Models from the Web , 2009, CICLing.

[8] Curt Burgess,et al. Producing high-dimensional semantic spaces from lexical co-occurrence , 1996 .

[9] Arne Jönsson,et al. Using Random Indexing to improve Singular Value Decomposition for Latent Semantic Analysis , 2008, LREC.

[10] Dekang Lin,et al. Automatic Retrieval and Clustering of Similar Words , 1998, ACL.

[11] Jean Véronis,et al. HyperLex: lexical cartography for information retrieval , 2004, Comput. Speech Lang..

[12] Mirella Lapata,et al. Dependency-Based Construction of Semantic Space Models , 2007, CL.

[13] Hinrich Schütze,et al. Automatic Word Sense Discrimination , 1998, Comput. Linguistics.

[14] Patrick Pantel,et al. Randomized Algorithms and NLP: Using Locality Sensitive Hash Functions for High Speed Noun Clustering , 2005, ACL.

[15] Frédérique Segond. Framework and Results for French , 2000, Comput. Humanit..

[16] Susan T. Dumais,et al. The latent semantic analysis theory of knowledge , 1997 .

[17] Dominic Widdows,et al. Discovering Corpus-Specific Word Senses , 2003, EACL.

[18] Eneko Agirre,et al. Word Sense Disambiguation: Algorithms and Applications (Text, Speech and Language Technology) , 2006 .

[19] Olivier Ferret,et al. Discovering word senses from a network of lexical cooccurrences , 2004, COLING.

[20] Eneko Agirre,et al. Word Sense Disambiguation: Algorithms and Applications , 2007 .