Malayalam Clause Boundary Identifier: Annotation and Evaluation

Clause boundary identification has a significant role in improving the performance of different practical NLP systems. In this paper we have dealt with automatically identifying various types of clausal structures in Malayalam, a Dravidian language. The clausal sentences were collected from tourism and health domain available in the Web. We discuss about the annotation schema and the inter-annotators agreement for various clauses and also the automatic identification of clause boundaries using CRFs a Machine learning approach. To smooth the errors obtained from the CRFs tagging, we have used linguistic rules. For Inter-annotators agreement we have used kappa coefficient as the agreement statistic. The evaluation gave encouraging result.

[1]  Dipti Misra Sharma,et al.  A Modular Cascaded Approach to Complete Parsing , 2009, 2009 International Conference on Asian Language Processing.

[2]  Sobha Lalitha Devi,et al.  Clause Boundary Identification Using Conditional Random Fields , 2008, CICLing.

[3]  Ferran Plà,et al.  Clause detection using HMM , 2001, CoNLL.

[4]  Iñaki Alegria,et al.  Chunk and Clause Identification for Basque by Filtering and Ranking with Perceptrons , 2008, Proces. del Leng. Natural.

[5]  Chung Yong Lim,et al.  A Case Study on Inter-Annotator Agreement for Word Sense Disambiguation , 1999 .

[6]  Wei Li,et al.  Early results for Named Entity Recognition with Conditional Random Fields, Feature Induction and Web-Enhanced Lexicons , 2003, CoNLL.

[7]  Hervé Déjean,et al.  Introduction to the CoNLL-2001 shared task: clause identification , 2001, CoNLL.

[8]  Eva I. Ejerhed,et al.  Finding Clauses in Unrestricted Text by Finitary and Stochastic Methods , 1988, ANLP.

[9]  Constantin Orasan,et al.  A hybrid method for clause splitting in unrestricted English texts , 2000 .

[10]  V. J. Leffa Clause Processing in Complex Sentences , 2008 .

[11]  Sivaji Bandyopadhyay,et al.  Clause Identification and Classification in Bengali , 2010 .

[12]  Georgiana Puscasu,et al.  A Multilingual Method for Clause Splitting , 2003 .

[13]  Fernando Pereira,et al.  Shallow Parsing with Conditional Random Fields , 2003, NAACL.

[14]  Vilson J. Leffa Clause processing in cornplex sentences , 1998 .