A new method for updating word senses in Hindi WordNet

Hindi WordNet, a rich computational lexicon is widely being used for many Hindi Natural Language Processing (NLP) applications. However it does not presently provide exhaustive list of senses for every word, which degrades the performance of such NLP applications. In this paper, we propose a graph based model and its associated techniques to automatically acquire words' senses. In the literature no such method is available which is capable of automatically identify the senses of the Hindi words. We use a Hindi part of speech tagged corpus for building the graph model. The linkage between noun-noun concepts is extracted on the basis of syntactic and semantic relationships. All of the senses of a word including the sense which is not present in Hindi WordNet are extracted. Our method also finds the categories of similar words. Using this model applications of NLP can be achieved at a higher level.

[1]  Ellen Riloff,et al.  A Corpus-Based Approach for Building Semantic Lexicons , 1997, EMNLP.

[2]  George A. Miller,et al.  WordNet 2 - A Morphologically and Semantically Enhanced Resource , 1999 .

[3]  Brian Roark,et al.  Noun-Phrase Co-Occurence Statistics for Semi-Automatic Semantic Lexicon Construction , 1998, COLING-ACL.

[4]  C. Pollard,et al.  Center for the Study of Language and Information , 2022 .

[5]  Roberto Navigli,et al.  Semi-Automatic Extension of Large-Scale Linguistic Knowledge Bases , 2005, FLAIRS.

[6]  Tanveer J. Siddiqui,et al.  An Unsupervised Approach to Hindi Word Sense Disambiguation , 2009, IHCI.

[7]  Roberto Navigli,et al.  Word sense disambiguation: A survey , 2009, CSUR.

[8]  Mirella Lapata,et al.  An Experimental Study of Graph Connectivity for Unsupervised Word Sense Disambiguation , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Alan F. Smeaton,et al.  Natural language processing and information retrieval , 1990, Inf. Process. Manag..

[10]  Dominic Widdows,et al.  Using Curvature and Markov Clustering in Graphs for Lexical Acquisition and Word Sense Discrimination , 2004 .

[11]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[12]  J. P. Gupta,et al.  A TENGRAM method based part-of-speech tagging of multi-category words in Hindi language , 2011, Expert Syst. Appl..

[13]  Pushpak Bhattacharyya,et al.  Hindi Word Sense Disambiguation , 2004 .

[14]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[15]  ABOUT IIT BOMBAY & , 2022 .

[16]  Adrian Novischi Accurate Semantic Annotations via Pattern Matching , 2002, FLAIRS Conference.

[17]  Dominic Widdows,et al.  A Graph Model for Unsupervised Lexical Acquisition , 2002, COLING.

[18]  Uma Shanker Tiwary,et al.  Natural Language Processing and Information Retrieval , 2008 .