Unsupervised Word Sense Induction from Multiple Semantic Spaces with Locality Sensitive Hashing

Word Sense Disambiguation is the task dedicated to the problem of finding out the sense of a word in context, from all of its many possible senses. Solving this problem requires to know the set of possible senses for a given word, which can be acquired from human knowledge, or from automatic discovery, called Word Sense Induction. In this article, we adapt two existing meta-methods of Word Sense Induction for the automatic construction of a disambiguation lexicon. Our adaptation is based on multiple semantic spaces (also called Word Space Models) produced from a syntactic analysis of a very large number of web pages. These adaptations and the results presented in this article dier from the original methods in that they use a combination of several high dimensional spaces instead of one single representation. Each of these competing semantic spaces takes part in a clustering phase in which they vote on sense induction.

[1]  Zellig S. Harris,et al.  Distributional Structure , 1954 .

[2]  Patrick Pantel,et al.  Discovering word senses from text , 2002, KDD.

[3]  Roberto Navigli,et al.  Word sense disambiguation: A survey , 2009, CSUR.

[4]  Moses Charikar,et al.  Similarity estimation techniques from rounding algorithms , 2002, STOC '02.

[5]  Vipin Kumar,et al.  Finding Topics in Collections of Documents: A Shared Nearest Neighbor Approach , 2003, Clustering and Information Retrieval.

[6]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[7]  Gregory Grefenstette,et al.  Conquering Language: Using NLP on a Massive Scale to Build High Dimensional Language Models from the Web , 2009, CICLing.

[8]  Curt Burgess,et al.  Producing high-dimensional semantic spaces from lexical co-occurrence , 1996 .

[9]  Arne Jönsson,et al.  Using Random Indexing to improve Singular Value Decomposition for Latent Semantic Analysis , 2008, LREC.

[10]  Dekang Lin,et al.  Automatic Retrieval and Clustering of Similar Words , 1998, ACL.

[11]  Jean Véronis,et al.  HyperLex: lexical cartography for information retrieval , 2004, Comput. Speech Lang..

[12]  Mirella Lapata,et al.  Dependency-Based Construction of Semantic Space Models , 2007, CL.

[13]  Hinrich Schütze,et al.  Automatic Word Sense Discrimination , 1998, Comput. Linguistics.

[14]  Patrick Pantel,et al.  Randomized Algorithms and NLP: Using Locality Sensitive Hash Functions for High Speed Noun Clustering , 2005, ACL.

[15]  Frédérique Segond Framework and Results for French , 2000, Comput. Humanit..

[16]  Susan T. Dumais,et al.  The latent semantic analysis theory of knowledge , 1997 .

[17]  Dominic Widdows,et al.  Discovering Corpus-Specific Word Senses , 2003, EACL.

[18]  Eneko Agirre,et al.  Word Sense Disambiguation: Algorithms and Applications (Text, Speech and Language Technology) , 2006 .

[19]  Olivier Ferret,et al.  Discovering word senses from a network of lexical cooccurrences , 2004, COLING.

[20]  Eneko Agirre,et al.  Word Sense Disambiguation: Algorithms and Applications , 2007 .