A TENGRAM method based part-of-speech tagging of multi-category words in Hindi language

In this paper, we have dealt on the problem of part-of-speech tagging of multi-category words which appear within the sentences of Hindi language. Firstly, a Hindi tagger is proposed which provides part-of-speech tags developed using grammar of Hindi language. For this purpose, Hindi Devanagari alphabets are used and their Hindi transliteration is done within the proposed tagger. Thereafter, a Rules' based TENGRAM method is described with an illustrative example, which guides to disambiguate multi-category words within sentences of Hindi corpus. The rules generated in TENGRAM are the result of computation of discernibility matrices, discernibility functions and reducts. These computations have been generated from decision tables which are based on theory of Rough sets. Basically, a discernibility matrix helps in cutting down indiscernible condition attributes; a discernibility function has rows corresponding to each column in the discernibility matrix which develops reducts; and the reducts provide a minimal subset of attributes which preserve indiscernibility relation of decision tables and hence they generate the decision rules.

[1]  Manu Konchady Text Mining Application Programming , 2006 .

[2]  Andrzej Skowron,et al.  The Discernibility Matrices and Functions in Information Systems , 1992, Intelligent Decision Support.

[3]  Atro Voutilainen Part-of-Speech Tagging , 2005 .

[4]  Atro Voutilainen,et al.  Tagging accurately - Don't guess if you know , 1994, ANLP.

[5]  Tsau Young Lin,et al.  Rough Set Methods and Applications , 2000 .

[6]  R. Słowiński Intelligent Decision Support: Handbook of Applications and Advances of the Rough Sets Theory , 1992 .

[7]  Lourdes Araujo Part-of-Speech Tagging with Evolutionary Algorithms , 2002, CICLing.

[8]  ABOUT IIT BOMBAY & , 2022 .

[9]  L. Polkowski Rough Sets: Mathematical Foundations , 2013 .

[10]  David Yarowsky,et al.  Part of Speech Tagging and Shallow Parsing of Indian Languag es , 2006 .

[11]  T. Y. Lin,et al.  Rough Sets and Data Mining , 1997, Springer US.

[12]  Sudeshna Sarkar,et al.  Part of Speech Tagging and Local Word Grouping Techniques for Natural Language Parsing in Hindi , 2003 .

[13]  Andrzej Czyzewski,et al.  Rough Set Methods and Applications , 1998, Rough Sets and Current Trends in Computing.

[14]  Stéphane Demri,et al.  Incomplete Information: Structure, Inference, Complexity , 2002, Monographs in Theoretical Computer Science An EATCS Series.

[15]  Naushad UzZaman,et al.  Comparison of Unigram, Bigram, HMM and Brill's POS tagging approaches for some South Asian languages , 2007 .

[16]  Girish H. Subramanian,et al.  A comparison of the decision table and tree , 1992, CACM.

[17]  Janusz Zalewski,et al.  Rough sets: Theoretical aspects of reasoning about data , 1996 .

[18]  J. Kacprzyk,et al.  Incomplete Information: Rough Set Analysis , 1997 .

[19]  S. N. Sivanandam,et al.  Principles of soft computing , 2011 .

[20]  J. Ross Quinlan,et al.  Generating Production Rules from Decision Trees , 1987, IJCAI.

[21]  Paul Douglas,et al.  International Conference on Information Technology : Coding and Computing , 2003 .

[22]  Jia-heng Zheng,et al.  An approach to improving the quality of part-of-speech tagging of Chinese text , 2004, International Conference on Information Technology: Coding and Computing, 2004. Proceedings. ITCC 2004..