A hybrid approach for automatic clause boundary identification in Hindi

A complex sentence, divided into clauses, can be analyzed more easily than the complex sentence itself. We present here, the task of clauses identification in Hindi text. To the best of our knowledge, not much work has been done on clause boundary identification for Hindi, which makes this task more important. We have built a Hybrid system which gives 90.804% F1-scores and 94.697% F1-scores for identification of clauses’ start and end respectively.

[1]  Fernando Pereira,et al.  Shallow Parsing with Conditional Random Fields , 2003, NAACL.

[2]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[3]  Riyaz Ahmad Bhat,et al.  Automatic Clause Boundary Annotation in the Hindi Treebank , 2013, PACLIC.

[4]  Eva I. Ejerhed,et al.  Finding Clauses in Unrestricted Text by Finitary and Stochastic Methods , 1988, ANLP.

[5]  K. P. Soman,et al.  Clause Boundary Identification for Tamil Language Using Dependency Parsing , 2011, SPIT/IPC.

[6]  Steven Abney Rapid Incremental Parsing with Repair , 1990 .

[7]  Sivaji Bandyopadhyay,et al.  Clause Identification and Classification in Bengali , 2010 .

[8]  Georgiana Puscasu,et al.  A Multilingual Method for Clause Splitting , 2003 .

[9]  V. J. Leffa Clause Processing in Complex Sentences , 2008 .

[10]  Wei Li,et al.  Early results for Named Entity Recognition with Conditional Random Fields, Feature Induction and Web-Enhanced Lexicons , 2003, CoNLL.

[11]  Sobha Lalitha Devi,et al.  Clause Boundary Identification Using Conditional Random Fields , 2008, CICLing.

[12]  Fei Xia,et al.  Hindi Syntax: Annotating Dependency, Lexical Predicate-Argument Structure, and Phrase Structure , 2009 .

[13]  Harris Papageorgiou Clause recognition in the framework of alignment , 1997 .

[14]  Vilson J. Leffa Clause processing in cornplex sentences , 1998 .

[15]  Dipti Misra Sharma,et al.  Improving Data Driven Dependency Parsing using Clausal Information , 2010, NAACL.