Evaluating Italian Parsing Across Syntactic Formalisms and Annotation Schemes

This paper describes some results about the way syntactic representations and parsing methodologies affect the performance of systems for parsing Italian. Italian has a rich morphology, especially with respect to Verbal suffixes, that can provide a parser with useful information for making the correct choices. With respect to syntactic representation, the experiments are based on a treebank for Italian, which has been delivered both in a dependency and in a constituency formalism, and for each of them also annotated at different degrees of specificity. The two paradigms are compared, and the different degrees of specificity in marking some syntactic phenomena are pointed out. On the basis of this treebank, statistical parsers have been evaluated. The results have shown that both the representation format and the parsing approach strongly affect the performance, that in some cases are very close and in others drastically different from the ones that constitute the state of the art for English.

[1]  Josef van Genabith,et al.  A Testsuite for Testing Parser Performance on Complex German Grammatical Constructions , 2008 .

[2]  Leonardo Lesmo Use of Semantic Information in a Syntactic Dependency Parser , 2011, EVALITA.

[3]  Cristina Bosco,et al.  Treebank Development: the TUT Approach , 2002 .

[4]  Cristina Bosco,et al.  A treebank-based study on the influence of Italian word order on parsing performance , 2012, LREC.

[5]  Cristina Bosco,et al.  A GRAMMATICAL RELATION SYSTEM FOR TREEBANK ANNOTATION , 2003 .

[6]  Joakim Nivre,et al.  MaltParser: A Data-Driven Parser-Generator for Dependency Parsing , 2006, LREC.

[7]  Giorgio Satta,et al.  An information-theoretic measure to evaluate parsing difficulty across treebanks , 2013, TSLP.

[8]  Joakim Nivre,et al.  MaltEval: an Evaluation and Visualization Tool for Dependency Parsing , 2008, LREC.

[9]  Cristina Bosco,et al.  Multiple-step Treebank Conversion: From Dependency to Penn Format , 2007, LAW@ACL.

[10]  Bernard E. M. Jones Exploring The Role Of Punctuation In Parsing Natural Text , 1994, COLING.

[11]  Cristina Bosco,et al.  Comparing linguistic information in treebank annotations , 2006, LREC.

[12]  Dan Klein,et al.  Improved Inference for Unlexicalized Parsing , 2007, NAACL.

[13]  Cristina Bosco,et al.  A richer annotation schema for an Italian treebank , 2000 .

[14]  Jackie Chi Kit Cheung,et al.  Topological Field Parsing of German , 2009, ACL/IJCNLP.

[15]  Cristina Bosco,et al.  A Relation-Based Schema for Treebank Annotation , 2003, AI*IA.

[16]  Joakim Nivre,et al.  Comparing the Influence of Different Treebank Annotations on Dependency Parsing , 2010, LREC.

[17]  Cristina Bosco,et al.  Evalita'09 Parsing Task: constituency parsers and the Penn format for Italian , 2009 .

[18]  Anne Abeillé,et al.  Treebanks: Building and Using Parsed Corpora , 2003 .

[19]  Christopher D. Manning,et al.  Better Arabic Parsing: Baselines, Evaluations, and Analysis , 2010, COLING.

[20]  Cristina Bosco,et al.  Looking Back to the EVALITA Constituency Parsing Task: 2007-2011 , 2011, EVALITA.

[21]  Cristina Bosco,et al.  Evalita parsing task: an analysis of the first parsing system contest for Italian , 2007 .

[22]  Cristina Bosco,et al.  Building a Treebank for Italian: a Data-driven Annotation Schema , 2000, LREC.

[23]  Felice Dell'Orletta,et al.  Domain Adaptation for Dependency Parsing at Evalita 2011 , 2011, EVALITA.

[24]  Mark Steedman,et al.  Unbounded Dependency Recovery for Parser Evaluation , 2009, EMNLP.

[25]  Cristina Bosco,et al.  Grammatical Relations's System in Treebank Annotation , 2001, ACL.

[26]  Joakim Nivre,et al.  MaltParser at the EVALITA 2009 Dependency Parsing Task , 2009 .

[27]  Michael Collins,et al.  A Statistical Parser for Czech , 1999, ACL.

[28]  Cristina Bosco,et al.  Evalita'09 Parsing Task: comparing dependency parsers and treebanks , 2009 .

[29]  Cristina Bosco,et al.  Converting a dependency treebank to a categorial grammar treebank for Italian , 2009 .

[30]  Cristina Bosco Linguistic knowledge extraction from corpus parallel annotations , 2009 .