Automatic Generation of Synsets for Wordnet of Hindi Language

India is a land of 122 languages and numerous dialects. Lack of competent lexical resources for Indian languages is a ubiquitous fact, which negatively affects the development of tools for NLP of Indian languages. Recent advancements like the Indo WordNet project has significantly contributed to dealing with the scarcity of lexicons, but the progress and coverage is a matter of dispute. The bottlenecks, cost, time, and skilled lexicographers further slackens the progress. In this article, the authors propose a technique to automate the generation of lexical entries using a machine learning approach which visibly expedites the process of lexicon generation like WordNet. The reluctance to adopt an automated approach is majorly credited to a lack of accuracy, the inability to capture a regional touch of a language, incorrect back-translation, etc. To overcome this issue, the author will use Wikipedia to validate the synsets.

[1]  Horacio Rodríguez,et al.  Combining Multiple Methods for the Automatic Construction of Multilingual WordNets , 1997, ArXiv.

[2]  Kevin Knight,et al.  Building a Large Ontology for Machine Translation , 1993, HLT.

[3]  George A. Miller,et al.  Introduction to WordNet: An On-line Lexical Database , 1990 .

[4]  Thomas Deselaers,et al.  GyroPen: Gyroscopes for Pen-Input With Mobile Phones , 2015, IEEE Transactions on Human-Machine Systems.

[5]  Shiwen Yu,et al.  Building a Bilingual WordNet-Like Lexicon: The New Approach and Algorithms , 2002, COLING.

[6]  Eneko Agirre,et al.  Disambiguating bilingual nominal entries against WordNet , 1995, ArXiv.

[7]  Ted Pedersen,et al.  WordNet::Similarity - Measuring the Relatedness of Concepts , 2004, NAACL.

[8]  P. Deepa Shenoy,et al.  HMDSAD: Hindi multi-domain sentiment aware dictionary , 2015, 2015 International Conference on Computing and Network Communications (CoCoNet).

[9]  Xiao Ma,et al.  Fully convolutional network with cluster for semantic segmentation , 2018 .

[10]  Pushpak Bhattacharyya,et al.  Lexical Resources for Hindi Marathi MT , 2017, ArXiv.

[11]  Vasudeva Varma,et al.  Hindi Subjective Lexicon: A Lexical Resource for Hindi Adjective Polarity Classification , 2012, LREC.

[12]  Sambhav Jain,et al.  Hindi to English Machine Translation: Using Effective Selection in Multi-Model SMT , 2014, LREC.

[13]  Jyoti D. Pawar,et al.  The WordNet in Indian Languages , 2016 .

[14]  Riyaz Ahmad Bhat,et al.  Improving Transition-Based Dependency Parsing of Hindi and Urdu by Modeling Syntactically Relevant Phenomena , 2017, ACM Trans. Asian Low Resour. Lang. Inf. Process..

[15]  Pushpak Bhattacharyya,et al.  Leveraging Small Multilingual Corpora for SMT Using Many Pivot Languages , 2015, NAACL.

[16]  P. Deepa Shenoy,et al.  HSRA: Hindi stopword removal algorithm , 2016, 2016 International Conference on Microelectronics, Computing and Communications (MicroCom).

[17]  Suresh Kumar,et al.  Modified Non-Recursive Algorithm for Reconstructing a Binary Tree , 2012 .

[18]  Sergi Cervell,et al.  Methods and Tools for Building the Catalan WordNet , 1998, ArXiv.

[19]  Pushpak Bhattacharyya,et al.  Shata-Anuvadak: Tackling Multiway Translation of Indian Languages , 2014, LREC.

[20]  P. Deepa Shenoy,et al.  HOMS: Hindi opinion mining system , 2015, 2015 IEEE 2nd International Conference on Recent Trends in Information Systems (ReTIS).

[21]  Gerhard Weikum,et al.  Constructing and utilizing wordnets using statistical methods , 2012, Lang. Resour. Evaluation.

[22]  Graeme Hirst,et al.  Evaluating WordNet-based Measures of Lexical Semantic Relatedness , 2006, CL.

[23]  Max Mühlhäuser,et al.  Analyzing and accessing Wikipedia as a lexical semantic resource , 2007 .

[24]  Eduard H. Hovy,et al.  Building Japanese-English Dictionary based on Ontology for Machine Translation , 1994, HLT.

[25]  A. Robert Calderbank,et al.  A Bregman Matrix and the Gradient of Mutual Information for Vector Poisson and Gaussian Channels , 2014, IEEE Transactions on Information Theory.

[26]  Fabio Nelli,et al.  Machine Learning with scikit-learn , 2015 .

[27]  Chih-Jen Lin,et al.  A Practical Guide to Support Vector Classication , 2008 .

[28]  Pushpak Bhattacharyya,et al.  The IIT Bombay Hindi-English Translation System at WMT 2014 , 2014, WMT@ACL.