Conversion from Paninian Karakas to Universal Dependencies for Hindi Dependency Treebank

Universal Dependencies (UD) are gaining much attention of late for systematic evaluation of cross-lingual techniques for crosslingual dependency parsing. In this paper we present our work in line with UD. Our contribution to this is manifold. We extend UD to Indian languages through conversion of Pānịnian Dependencies to UD for the Hindi Dependency Treebank (HDTB). We discuss the differences in annotation in both the schemes, present parsing experiments for both the formalisms and empirically evaluate their weaknesses and strengths for Hindi. We produce an automatically converted Hindi Treebank conforming to the international standard UD scheme, making it useful as a resource for multilingual language technology.

[1]  Simonetta Montemagni,et al.  Converting Italian Treebanks: Towards an Italian Stanford Dependency Treebank , 2013, LAW@ACL.

[2]  Joakim Nivre,et al.  Universal Stanford dependencies: A cross-linguistic typology , 2014, LREC.

[3]  Daniel Zeman,et al.  HamleDT: To Parse or Not to Parse? , 2012, LREC.

[4]  Himani Chaudhry Annotation and Issues in Building an English Dependency Treebank , 2011 .

[5]  Akshar Bharati,et al.  Insights into Non-projectivity in Hindi , 2009, ACL.

[6]  Akshar Bharati,et al.  Natural language processing : a Paninian perspective , 1996 .

[7]  Fei Xia Towards a Multi-Representational Treebank , 2008 .

[8]  Dipti Misra Sharma,et al.  Dependency Annotation Scheme for Indian Languages , 2008, IJCNLP.

[9]  Fei Xia,et al.  The Hindi/Urdu Treebank Project , 2017 .

[10]  Lucien Tesnière Éléments de syntaxe structurale , 1959 .

[11]  Slav Petrov,et al.  A Universal Part-of-Speech Tagset , 2011, LREC.

[12]  Dipti Misra Sharma,et al.  AnnCorra : Annotating Corpora Guidelines For POS And Chunk Annotation For Indian Languages , 2008 .

[13]  Janna Lipenkova,et al.  Converting Russian Dependency Treebank to Stanford Typed Dependencies Representation , 2014, EACL.

[14]  Christopher D. Manning,et al.  The Stanford Typed Dependencies Representation , 2008, CF+CDPE@COLING.

[15]  Joakim Nivre,et al.  Universal Dependency Annotation for Multilingual Parsing , 2013, ACL.

[16]  Joakim Nivre,et al.  On the Role of Morphosyntactic Features in Hindi Dependency Parsing , 2010, SPMRL@NAACL-HLT.

[17]  Veronika Laippala,et al.  Universal Dependencies for Finnish , 2015, NODALIDA.

[18]  Fei Xia,et al.  A Multi-Representational and Multi-Layered Treebank for Hindi/Urdu , 2009, Linguistic Annotation Workshop.

[19]  Sampo Pyysalo,et al.  Universal Dependencies v1: A Multilingual Treebank Collection , 2016, LREC.

[20]  Héctor Martínez Alonso,et al.  Universal Dependencies for Danish , 2015 .

[21]  Fei Xia,et al.  Hindi Syntax: Annotating Dependency, Lexical Predicate-Argument Structure, and Phrase Structure , 2009 .