Word Sense Determination from Wikipedia Data Using Neural Networks

Many words have multiple meanings. For example, “plant” can mean a type of living organism or a factory. Being able to determine the sense of such words is very useful in natural language processing tasks, such as speech synthesis, question answering, and machine translation. For the project described in this report, we used a modular model to classify the sense of words to be disambiguated. This model consisted of two parts: The first part was a neural-network-based language model to compute continuous vector representations of words from data sets created from Wikipedia pages. The second part classified the meaning of the given word without explicitly knowing what the meaning is. In this unsupervised word sense determination task, we did not need human-tagged training data or a dictionary of senses for each word. We tested the model with some naturally ambiguous words, and compared our experimental results with the related work by Schutze in 1998. Our model achieved similar accuracy as Schutze’s work for some words.

[1]  Lukás Burget,et al.  Empirical Evaluation and Combination of Advanced Language Modeling Techniques , 2011, INTERSPEECH.

[2]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[3]  Thorsten Brants,et al.  Large Language Models in Machine Translation , 2007, EMNLP.

[4]  Michael E. Lesk,et al.  Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone , 1986, SIGDOC '86.

[5]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[6]  Patrick Pantel,et al.  Discovering word senses from text , 2002, KDD.

[7]  James L. McClelland,et al.  Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations , 1986 .

[8]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[9]  Hinrich Schütze,et al.  Automatic Word Sense Discrimination , 1998, Comput. Linguistics.

[10]  H. Schütze,et al.  Dimensions of meaning , 1992, Supercomputing '92.

[11]  David Yarowsky,et al.  Unsupervised Word Sense Disambiguation Rivaling Supervised Methods , 1995, ACL.

[12]  Geoffrey E. Hinton,et al.  Distributed Representations , 1986, The Philosophy of Artificial Intelligence.

[13]  Holger Schwenk,et al.  Continuous space language models , 2007, Comput. Speech Lang..

[14]  Léon Bottou,et al.  Large-Scale Machine Learning with Stochastic Gradient Descent , 2010, COMPSTAT.

[15]  Aapo Hyvärinen,et al.  Noise-Contrastive Estimation of Unnormalized Statistical Models, with Applications to Natural Image Statistics , 2012, J. Mach. Learn. Res..

[16]  James R. Curran,et al.  Improvements in Automatic Thesaurus Extraction , 2002, ACL 2002.

[17]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[18]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.