Less is More? Towards a Reduced Inventory of Categories for Training a Parser for the Italian Stanford Dependencies

Stanford Dependencies (SD) represent nowadays a de facto standard as far as dependency annotation is concerned. The goal of this paper is to explore pros and cons of different strategies for generating SD annotated Italian texts to enrich the existing Italian Stanford Dependency Treebank (ISDT). This is done by comparing the performance of a statistical parser (DeSR) trained on a simpler resource (the augmented version of the Merged Italian Dependency Treebank or MIDT+) and whose output was automatically converted to SD, with the results of the parser directly trained on ISDT. Experiments carried out to test reliability and effectiveness of the two strategies show that the performance of a parser trained on the reduced dependencies repertoire, whose output can be easily converted to SD, is slightly higher than the performance of a parser directly trained on ISDT. A non-negligible advantage of the first strategy for generating SD annotated texts is that semi-automatic extensions of the training resource are more easily and consistently carried out with respect to a reduced dependency tag set. Preliminary experiments carried out for generating the collapsed and propagated SD representation are also reported.

[1]  Gabriela Ferraro,et al.  How Does the Granularity of an Annotation Scheme Influence Dependency Parsing Performance? , 2012, COLING.

[2]  Thierry Declerck SynAF: Towards a Standard for Syntactic Annotation , 2006, LREC.

[3]  Christopher D. Manning,et al.  Stanford typed dependencies manual , 2010 .

[4]  Marc Kemps-Snijders,et al.  ISOcat: remodelling metadata for language resources , 2009, Int. J. Metadata Semant. Ontologies.

[5]  Roberto Basili,et al.  Building the Italian Syntactic-Semantic Treebank , 2003 .

[6]  Christopher D. Manning,et al.  The Stanford Typed Dependencies Representation , 2008, CF+CDPE@COLING.

[7]  Joakim Nivre,et al.  Universal Dependency Annotation for Multilingual Parsing , 2013, ACL.

[8]  Felice Dell'Orletta,et al.  Reverse Revision and Linear Tree Combination for Dependency Parsing , 2009, HLT-NAACL.

[9]  Joakim Nivre,et al.  Comparing the Influence of Different Treebank Annotations on Dependency Parsing , 2010, LREC.

[10]  Cristina Bosco,et al.  Building a Treebank for Italian: a Data-driven Annotation Schema , 2000, LREC.

[11]  Montemagni Simonetta,et al.  Harmonization and merging of two Italian dependency treebanks , 2012 .

[12]  Simonetta Montemagni,et al.  Converting Italian Treebanks: Towards an Italian Stanford Dependency Treebank , 2013, LAW@ACL.

[13]  Samuel R. Bowman,et al.  More Constructions, More Genres: Extending Stanford Dependencies , 2013, DepLing.

[14]  Tapio Salakoski,et al.  Predicting Conjunct Propagation and Other Extended Stanford Dependencies , 2013, DepLing.