Exploring self training for Hindi dependency parsing

In this paper we explore the effect of selftraining on Hindi dependency parsing. We consider a state-of-the-art Hindi dependency parser and apply self-training by using a large raw corpus. We consider two types of raw corpus, one from same domain as of training and testing data and the other from different domain. We also do an experiment, where we add small gold-standard data to the training set. Comparing these experiments, we show the impact of adding small, but gold-standard data to training data versus large, but automatically parsed data on Hindi parser.

[1]  Stuart M. Shieber,et al.  Evidence against the context-freeness of natural language , 1985 .

[2]  Eugene Charniak,et al.  Effective Self-Training for Parsing , 2006, NAACL.

[3]  Bharat Ram Ambati,et al.  Two semantic features make all the difference in Parsing accuracy , 2008 .

[4]  Ari Rappoport,et al.  Self-Training for Enhancement and Domain Adaptation of Statistical Parsers Trained on Small Datasets , 2007, ACL.

[5]  Yuji Matsumoto MaltParser: A language-independent system for data-driven dependency parsing , 2005 .

[6]  Akshar Bharati,et al.  Natural language processing : a Paninian perspective , 1996 .

[7]  Joakim Nivre,et al.  MaltEval: an Evaluation and Visualization Tool for Dependency Parsing , 2008, LREC.

[8]  Prashanth Mannem,et al.  Partial Parsing from Bitext Projections , 2011, ACL.

[9]  Fei Xia,et al.  A Multi-Representational and Multi-Layered Treebank for Hindi/Urdu , 2009, Linguistic Annotation Workshop.

[10]  Sebastian Riedel,et al.  The CoNLL 2007 Shared Task on Dependency Parsing , 2007, EMNLP.

[11]  Fernando Pereira,et al.  Multilingual Dependency Analysis with a Two-Stage Discriminative Parser , 2006, CoNLL.

[12]  Igor Mel’čuk,et al.  Dependency Syntax: Theory and Practice , 1987 .

[13]  Prashanth Mannem,et al.  The ICON-2010 tools contest on Indian language dependency parsing , 2010 .

[14]  Hitoshi Isahara,et al.  Learning Reliable Information for Dependency Parsing Adaptation , 2008, COLING.

[15]  Dipti Misra Sharma,et al.  A High Recall Error Identification Tool for Hindi Treebank Validation , 2010, LREC.

[16]  Dependency Parsers for Indian Languages , 2009 .

[17]  Mark Steedman,et al.  Bootstrapping statistical parsers from small datasets , 2003, EACL.