LTAG-spinal treebank and parser for Hindi

Statistical parsers need huge annotated treebanks to learn from and building treebanks is an expensive proposition. To create parsers for different grammar formalisms in a language, building separate treebanks for each of those isn’t a feasible task. Treebanks available in one formalism can be converted into an other either automatically or with minimal human effort by exploiting the similarities and differences between the two. In this work, we present an approach to extract an LTAGspinal treebank from Hyderabad Dependency Treebank for Hindi. LTAG-spinal is a variant of Lexicalized Tree Adjoining Grammar (LTAG) with desirable linguistic, computational and statistical properties. A bidirectional LTAG dependency parser is trained on the extracted treebank and an LTAG dependency accuracy of 80.86% is reported.

[1]  Andy Way,et al.  Automatic annotation of the Penn-treebank with LFG f-structureinformation , 2002 .

[2]  Akshar Bharati,et al.  Natural language processing : a Paninian perspective , 1996 .

[3]  Aravind K. Joshi,et al.  Tree-Adjoining Grammars , 1997, Handbook of Formal Languages.

[4]  Jan Hajic,et al.  Prague Arabic Dependency Treebank: Development in Data and Tools , 2004 .

[5]  Bharat Ram Ambati,et al.  Two semantic features make all the difference in Parsing accuracy , 2008 .

[6]  Srinivas Bangalore,et al.  Automated extraction of Tree-Adjoining Grammars from treebanks , 2006, Nat. Lang. Eng..

[7]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[8]  Aravind K. Joshi,et al.  LTAG-spinal and the Treebank , 2008, Lang. Resour. Evaluation.

[9]  Fei Xia Towards a Multi-Representational Treebank , 2008 .

[10]  Dipti Misra Sharma,et al.  Dependency Annotation Scheme for Indian Languages , 2008, IJCNLP.

[11]  Aravind K. Joshi,et al.  LTAG Dependency Parsing with Bidirectional Incremental Construction , 2008, EMNLP.

[12]  Daniel Gildea,et al.  The Proposition Bank: An Annotated Corpus of Semantic Roles , 2005, CL.

[13]  Sebastian Riedel,et al.  The CoNLL 2007 Shared Task on Dependency Parsing , 2007, EMNLP.

[14]  Akshar Bharati,et al.  Insights into Non-projectivity in Hindi , 2009, ACL.

[15]  Fei Xia,et al.  A Multi-Representational and Multi-Layered Treebank for Hindi/Urdu , 2009, Linguistic Annotation Workshop.