Using an SVM Ensemble System for Improved Tamil Dependency Parsing

Dependency parsing has been shown to improve NLP systems in certain languages and in many cases helps achieve state of the art results in NLP applications, in particular applications for free word order languages. Morphologically rich languages are often short on training data or require much higher amounts of training data due to the increased size of their lexicon. This paper examines a new approach for addressing morphologically rich languages with little training data to start. Using Tamil as our test language, we create 9 dependency parse models with a limited amount of training data. Using these models we train an SVM classifier using only the model agreements as features. We use this SVM classifier on an edge by edge decision to form an ensemble parse tree. Using only model agreements as features allows this method to remain language independent and applicable to a wide range of morphologically rich languages. We show a statistically significant 5.44% improvement over the average dependency model and a statistically significant 0.52% improvement over the best individual system.

[1]  Noah A. Smith,et al.  Dependency Parsing , 2009, Encyclopedia of Artificial Intelligence.

[2]  Jun'ichi Tsujii,et al.  Dependency Parsing and Domain Adaptation with LR Models and Parser Ensembles , 2007, EMNLP.

[3]  Nathan Green,et al.  Hybrid Combination of Constituency and Dependency Trees into an Ensemble Dependency Parser , 2012 .

[4]  Joakim Nivre,et al.  Single Malt or Blended? A Study in Multilingual Parser Optimization , 2007, EMNLP.

[5]  Sabine Buchholz,et al.  CoNLL-X Shared Task on Multilingual Dependency Parsing , 2006, CoNLL.

[6]  Gholamreza Haffari,et al.  An Ensemble Model that Combines Syntactic and Semantic Clustering for Discriminative Dependency Parsing , 2011, ACL.

[7]  Joakim Nivre,et al.  Integrating Graph-Based and Transition-Based Dependency Parsers , 2008, ACL.

[8]  Daniel Zeman,et al.  Improving Parsing Accuracy by Combining Diverse Dependency Parsers , 2005, IWPT.

[9]  Joakim Nivre,et al.  MaltParser: A Language-Independent System for Data-Driven Dependency Parsing , 2007, Natural Language Engineering.

[10]  Zdenek Zabokrtský,et al.  Tamil Dependency Parsing: Results Using Rule Based and Corpus Based Approaches , 2011, CICLing.

[11]  Anders Søgaard,et al.  Semi-supervised dependency parsing using generalized tri-training , 2010, COLING.

[12]  Jason Eisner,et al.  Three New Probabilistic Models for Dependency Parsing: An Exploration , 1996, COLING.

[13]  Eric P. Xing,et al.  Stacking Dependency Parsers , 2008, EMNLP.

[14]  Mihai Surdeanu,et al.  Ensemble Models for Dependency Parsing: Cheap and Good? , 2010, HLT-NAACL.

[15]  Thomas G. Dietterich Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.

[16]  Fernando Pereira,et al.  Non-Projective Dependency Parsing using Spanning Tree Algorithms , 2005, HLT.

[17]  Alon Lavie,et al.  Parser Combination by Reparsing , 2006, NAACL.

[18]  Prashanth Mannem,et al.  The ICON-2010 tools contest on Indian language dependency parsing , 2010 .

[19]  Zdenek Zabokrtský,et al.  Prague Dependency Style Treebank for Tamil , 2012, LREC.