Construction Grammar Based Annotation Framework for Parsing Tamil

Syntactic parsing in NLP is the task of working out the grammatical structure of sentences. Some of the purely formal approaches to parsing such as phrase structure grammar, dependency grammar have been successfully employed for a variety of languages. While phrase structure based constituent analysis is possible for fixed order languages such as English, dependency analysis between the grammatical units have been suitable for many free word order languages. These approaches rely on identifying the linguistic units based on their formal syntactic properties and establishing the relationships between such units in the form of a tree. Instead, we characterize every morphosyntactic unit as a mapping between form and function on the lines of Construction Grammar and parsing as identification of dependency relations between such conceptual units. Our approach to parser annotation shows an average MALT LAS score of 82.21% on Tamil gold annotated corpus of 935 sentences in a five-fold validation experiment.

[1]  Reut Tsarfaty,et al.  Parsing Morphologically Rich Languages: Introduction to the Special Issue , 2013, Computational Linguistics.

[2]  Jan Hajic,et al.  Parsing Universal Dependency Treebanks using Neural Networks and Search-Based Oracle Milan , 2016 .

[3]  K. A. Jayaseelan,et al.  Finiteness and Negation in Dravidian , 2008 .

[4]  Samar Husain,et al.  A Two Stage Constraint Based Hybrid Dependency Parser for Telugu , 2010 .

[5]  Anna Szabolcsi,et al.  What do quantifier particles do? , 2015 .

[6]  K P Soman,et al.  Penn Treebank-Based Syntactic Parsers for South Dravidian Languages using a Machine Learning Approach , 2010 .

[7]  Akshar Bharati,et al.  Natural language processing : a Paninian perspective , 1996 .

[8]  Nizar Habash,et al.  Overview of the SPMRL 2013 Shared Task: A Cross-Framework Evaluation of Parsing Morphologically Rich Languages , 2013, SPMRL@EMNLP.

[9]  Thomas McFadden,et al.  Finiteness in South Asian languages: an introduction , 2014 .

[10]  K. A. Jayaseelan Coordination, relativization and finiteness in Dravidian , 2014 .

[11]  Joakim Nivre,et al.  Parsing Indian Languages with MaltParser , 2009 .

[12]  A. Goldberg,et al.  Construction grammar. , 2010, Wiley interdisciplinary reviews. Cognitive science.

[13]  Dipti Misra Sharma,et al.  Simple Parser for Indian Languages in a Dependency Framework , 2009, Linguistic Annotation Workshop.

[14]  R. Amritavalli,et al.  Separating tense and finiteness: anchoring in Dravidian , 2014 .

[15]  Prashanth Mannem,et al.  Bidirectional Dependency Parser for Hindi , 2009 .

[16]  Dipti Misra Sharma,et al.  AnnCorra : Annotating Corpora Guidelines For POS And Chunk Annotation For Indian Languages , 2008 .

[17]  K. A. Jayaseelan The Serial Verb Construction in Malayalam , 2004 .

[18]  Mirjam Fried,et al.  Construction grammar in a cross-language perspective , 2004 .

[19]  Joakim Nivre,et al.  On the Role of Morphosyntactic Features in Hindi Dependency Parsing , 2010, SPMRL@NAACL-HLT.

[20]  Ronald W. Langacker,et al.  Cognitive Grammar: A Basic Introduction , 2008 .

[21]  Stuart M. Shieber,et al.  Evidence against the context-freeness of natural language , 1985 .

[22]  Rajesh Kasturirangan,et al.  Cognitive processes underlying the meaning of complex predicates and serial verbs from the perspective of individuating and ordering situations in Bānlā , 2010, IITM '10.

[23]  Zdenek Zabokrtský,et al.  Tamil Dependency Parsing: Results Using Rule Based and Corpus Based Approaches , 2011, CICLing.

[24]  Igor Mel’čuk,et al.  Dependency Syntax: Theory and Practice , 1987 .

[25]  Akshar Bharati,et al.  Parsing Free Word Order Languages in the Paninian Framework , 1993, ACL.

[26]  M Selvam,et al.  Structural Parsing of Natural Language Text in Tamil Using Phrase Structure Hybrid Language Model , 2008 .

[27]  B. Venkata Seshu Kumari,et al.  Hindi Dependency Parsing using a combined model of Malt and MST , 2012 .