Annotation and Issues in Building an English Dependency Treebank

The Paninian Grammar framework, given by Panini for his analysis of Sanskrit Language, is finding its extensive application on languages other than Sanskrit, about two thousand five hundred years after its formulation. The work presented in this paper is one such application that extends Paninian Grammar (PG or CPG: Computational Paninian Grammar) to English, a fixed word order language. It presents how CPG can account for English and makes available, a linguistically rich resource in the form of an English Dependency Treebank. At present, 2000 sentences have been annotated as part of this effort, using the Hyderabad Dependency Treebank (HyDT) Annotation Scheme for Indian languages, (modelled on CPG). In the course of this paper we talk about CPG and the annotation scheme used for this work. We then talk about the task of annotation of the English language data per the scheme and how its application to English varies from Hindi. Further, we discuss our handling of some constructions of English, and some anomalies in the language that pose a challenge to the application of this annotation scheme to English, as is.

[1]  Cārudeva Śāstrī Pāṇini, re-interpreted , 1990 .

[2]  Sambhav Jain,et al.  Two Methods to Incorporate ’Local Morphosyntactic’ Features in Hindi Dependency Parsing , 2010, SPMRL@NAACL-HLT.

[3]  Igor Mel’čuk,et al.  Dependency Syntax: Theory and Practice , 1987 .

[4]  Akshar Bharati,et al.  Insights into Non-projectivity in Hindi , 2009, ACL.

[5]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[6]  Akshar Bharati,et al.  Parsing Free Word Order Languages in the Paninian Framework , 1993, ACL.

[7]  Joakim Nivre,et al.  Dependency Grammar and Dependency Parsing , 2005 .

[8]  Dipti Misra Sharma,et al.  A Karaka Based Annotation Scheme for English , 2009, CICLing.

[9]  Martha Palmer,et al.  Analysis of the Hindi Proposition Bank using Dependency Structure , 2011, Linguistic Annotation Workshop.

[10]  J. F. Staal,et al.  Syntactic and Semantic Relations in Pāṇini , 1969 .

[11]  Daniel Gildea,et al.  The Necessity of Parsing for Predicate Argument Recognition , 2002, ACL.

[12]  Marilyn A. Walker,et al.  A Dependency Treebank for English , 2002, LREC.

[13]  菅山 謙正,et al.  Word Grammar 理論の研究 , 2005 .

[14]  Dipti Misra Sharma,et al.  Dependency Annotation Scheme for Indian Languages , 2008, IJCNLP.

[15]  Fei Xia,et al.  Hindi Syntax: Annotating Dependency, Lexical Predicate-Argument Structure, and Phrase Structure , 2009 .

[16]  Fei Xia,et al.  A Multi-Representational and Multi-Layered Treebank for Hindi/Urdu , 2009, Linguistic Annotation Workshop.

[17]  Lucien Tesnière Éléments de syntaxe structurale , 1959 .

[18]  Akshar Bharati,et al.  Paninian Grammar Framework Applied to English , .

[19]  Mark Pedersen,et al.  Relative clauses in Hindi and Arabic: A Paninian Dependency Grammar Analysis , 2004, COLING 2004.