论文信息 - Building a Treebank for Italian: a Data-driven Annotation Schema

Building a Treebank for Italian: a Data-driven Annotation Schema

Many natural language researchers are currently turning their attention to treebank development and trying to achieve accuracy and corpus data coverage in their representation formats. This paper presents a data-driven annotation schema developed for an Italian treebank ensuring data coverage and consistency between annotation of linguistic phenomena. The schema is a dependency-based format centered upon the notion of predicate-argument structure augmented with traces to represent discontinuous constituents. The treebank development involves an annotation process performed by a human annotator helped by an interactive parsing tool that builds incrementally syntactic representation of the sentence. To increase the syntactic knowledge of this parser, a specific data-driven strategy has been applied. We describe the cyclical development of the annotation schema highlighting the richness and flexibility of the format, and we present some representational issues.

[1] Beatrice Santorini,et al. Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[2] Wojciech Skut,et al. Automation of Treebank Annotation , 1998, CoNLL.

[3] Oliviero Stock,et al. Parsing with Flexibility, Dynamic Strategies, and Idioms in Mind , 1989, CL.

[4] Guido Boella,et al. Automatic refinement of Lingnistic rules for tagging , 1998 .

[5] Kenneth Ward Church,et al. Introduction to the Special Issue on Computational Linguistics Using Large Corpora , 1993, Comput. Linguistics.

[6] Richard Hudson,et al. English word grammar , 1995 .

[7] Ann Bies,et al. The Penn Treebank: Annotating Predicate Argument Structure , 1994, HLT.

[8] Wojciech Skut,et al. An Annotation Scheme for Free Word Order Languages , 1997, ANLP.

[9] Luigi Rizzi,et al. Issues in Italian Syntax , 1981 .

[10] Eva Hajicová,et al. The Prague Dependency Tree Bank IHow Much of the Underlying Syntactic Structure Can Be Tagged Automatically? , 1999, Prague Bull. Math. Linguistics.

[11] Vincenzo Lombardo,et al. A formal theory of dependency syntax with non-lexical units , 1999 .

[12] Vincenzo Lombardo,et al. Incrementality and Lexicalism: A Treebank Study , 2002 .

[13] Wojciech Skut,et al. Tagging Grammatical Functions , 1997, EMNLP.

[14] Wojciech Skut,et al. A Linguistically Interpreted Corpus of German Newspaper Text , 1998, LREC.