Word Sense Disambiguation in Hindi Language Using Hyperspace Analogue to Language and Fuzzy C-Means Clustering

The problem of Word Sense Disambiguation (WSD) can be defined as the task of assigning the most appropriate sense to the polysemous word within a given context. Many supervised, unsupervised and semi-supervised approaches have been devised to deal with this problem, particularly, for the English language. However, this is not the case for Hindi language, where not much work has been done. In this paper, a new approach has been developed to perform disambiguation in Hindi language. For training the system, the text in Hindi language is converted into Hyperspace Analogue to Language (HAL) vectors, thereby, mapping each word into a high-dimensional space. We also deal with the fuzziness involved in disambiguation of words. We apply Fuzzy C-Means Clustering algorithm to form clusters denoting the various contexts in which the polysemous word may occur. The test data is then mapped into the high dimensional space created during the training phase. We test our approach on the corpus created using Hindi news articles and Wikipedia. We compare our approach with other significant approaches available in the literature and the experimental results indicate that our approach outperforms all the previous works done for Hindi Language.

[1]  Pushpak Bhattacharyya,et al.  Hindi Word Sense Disambiguation , 2004 .

[2]  W. N. Locke,et al.  Machine Translation of Languages: Fourteen Essays , 1955 .

[3]  Peter Bruza,et al.  Discovering information flow suing high dimensional conceptual space , 2001, SIGIR '01.

[4]  Pushpak Bhattacharyya,et al.  A Graph Based Approach to Word Sense Disambiguation for Hindi Language , 2012 .

[5]  Peter Bruza,et al.  Fuzzy K-Means Clustering on a High Dimensional Semantic Space , 2004, APWeb.

[6]  Devendra K. Tayal,et al.  Measuring context-meaning for open class words in Hindi language , 2013, 2013 Sixth International Conference on Contemporary Computing (IC3).

[7]  Lotfi A. Zadeh,et al.  Fuzzy Sets , 1996, Inf. Control..

[8]  Shaul Markovitch,et al.  Concept-Based Approach to Word-Sense Disambiguation , 2012, AAAI.

[9]  Sandeep Vishwakarma,et al.  Mining Association Rules Based Approach to Word Sense Disambiguation for Hindi Language , 2013 .

[10]  Hinrich Schütze,et al.  Automatic Word Sense Discrimination , 1998, Comput. Linguistics.

[11]  Michael Sussna,et al.  Word sense disambiguation for free-text indexing using a massive semantic network , 1993, CIKM '93.

[12]  T. Ross Fuzzy Logic with Engineering Applications , 1994 .

[13]  David Yarowsky,et al.  Unsupervised Word Sense Disambiguation Rivaling Supervised Methods , 1995, ACL.

[14]  Rada Mihalcea,et al.  Using Wikipedia for Automatic Word Sense Disambiguation , 2007, NAACL.

[15]  Paramjit Singh,et al.  Optimized Word Sense Disambiguation in Hindi using Genetic Algorithm , 2013 .

[16]  Michael E. Lesk,et al.  Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone , 1986, SIGDOC '86.

[17]  Roberto Navigli,et al.  Word sense disambiguation: A survey , 2009, CSUR.

[18]  Ratna Sanyal,et al.  Probabilistic Latent Semantic Analysis for Unsupervised Word Sense Disambiguation , 2013 .

[19]  W. N. Locke,et al.  Machine Translation of Languages: Fourteen Essays , 1955 .

[20]  Curt Burgess,et al.  Producing high-dimensional semantic spaces from lexical co-occurrence , 1996 .

[21]  Hao Chen,et al.  An Unsupervised Approach to Chinese Word Sense Disambiguation Based on Hownet , 2005, Int. J. Comput. Linguistics Chin. Lang. Process..