Harmonization and merging of two Italian dependency treebanks

The paper describes the methodology which is currently being defined for the construction of a “Merged Italian Dependency Treebank” (MIDT) starting from already existing resources. In particular, it reports the results of a case study carried out on two available dependency treebanks, i.e. TUT and ISST–TANL. The issues raised during the comparison of the annotation schemes underlying the two treebanks are discussed and investigated with a particular emphasis on the definition of a set of linguistic categories to be used as a “bridge” between the specific schemes. As an encoding format, the CoNLL de facto standard is used.

[1]  Harry Bunt,et al.  Anatomy of Annotation Schemes: Mapping to GrAF , 2010, Linguistic Annotation Workshop.

[2]  Vito Pirrelli,et al.  Where Opposites Meet. A Syntactic Meta-scheme for Corpus Annotation and Parsing Evaluation , 2000, LREC.

[3]  Giuseppe Attardi,et al.  Experiments with a Multilanguage Non-Projective Dependency Parser , 2006, CoNLL.

[4]  Geoffrey Leech,et al.  EAGLES recommendations for the morphosyntactic annotation of corpora , 1996 .

[5]  Sara Tonelli,et al.  Enriching the Venice Italian Treebank with Dependency and Grammatical Relations , 2008, LREC.

[6]  Cristina Bosco,et al.  Building a Treebank for Italian: a Data-driven Annotation Schema , 2000, LREC.

[7]  Felice Dell'Orletta,et al.  Reverse Revision and Linear Tree Combination for Dependency Parsing , 2009, HLT-NAACL.

[8]  S. Montemagni,et al.  The Italian dependency annotated corpus developed for the CoNLL-X Shared Task , 2007 .

[9]  Felice Dell'Orletta,et al.  Domain Adaptation for Dependency Parsing at Evalita 2011 , 2011, EVALITA.

[10]  Marc Kemps-Snijders,et al.  ISOcat: remodelling metadata for language resources , 2009, Int. J. Metadata Semant. Ontologies.

[11]  Joakim Nivre,et al.  Comparing the Influence of Different Treebank Annotations on Dependency Parsing , 2010, LREC.

[12]  Henrik Høeg Müller,et al.  Uncovering the 'lost' structure of translations with parallel treebanks , 2009 .

[13]  Roberto Basili,et al.  Building the Italian Syntactic-Semantic Treebank , 2003 .

[14]  Nancy Ide,et al.  Representing Linguistic Corpora and Their Annotations , 2006, LREC.

[15]  Thierry Declerck A Framework for Standardized Syntactic Annotation , 2008, LREC.

[16]  Felice Dell'Orletta,et al.  Accurate Dependency Parsing with a Stacked Multilayer Perceptron , 2009 .

[17]  Jackie Chi Kit Cheung,et al.  Topological Field Parsing of German , 2009, ACL/IJCNLP.

[18]  Joakim Nivre,et al.  Dependency Parsing , 2009, Lang. Linguistics Compass.

[19]  Cristina Bosco,et al.  Evalita'09 Parsing Task: comparing dependency parsers and treebanks , 2009 .

[20]  Nancy Ide,et al.  GrAF: A Graph-based Format for Linguistic Annotations , 2007, LAW@ACL.

[21]  Sebastian Riedel,et al.  The CoNLL 2007 Shared Task on Dependency Parsing , 2007, EMNLP.

[22]  Yoshihiko Hayashi,et al.  LAF/GrAF-grounded Representation of Dependency Structures , 2010, LREC.

[23]  Mazzei Alessandro,et al.  The Evalita 2011 Parsing Task: the Dependency Track , 2012 .

[24]  Christopher D. Manning,et al.  Stanford typed dependencies manual , 2010 .

[25]  菅山 謙正,et al.  Word Grammar 理論の研究 , 2005 .