I-CAB: the Italian Content Annotation Bank

In this paper we present work in progress for the c reation of the Italian Content Annotation Bank (I-C AB), a corpus of Italian news annotated with semantic information at different le vels. The first level is represented by temporal ex pr ssions, the second level is represented by different types of entities (i.e. pe rson, organizations, locations and geo-political en titi s), and the third level is represented by relations between entities (e.g. the affiliation relation connecting a person to an org anization). So far I-CAB has been manually annotated with temporal expressions, perso n entities and organization entities. As we intend I-CAB to become a benchmark for various automatic Information Extraction tasks, we followed a policy of reusing already available markup languages. In particular, we adopted the annotation schemes developed for the ACE Entity Detection and Time Expressions Recognit io and Normalization tasks. As the ACE guidelines have originally been d veloped for English, part of the effort consisted in adapting them to the specific morpho-syntactic features of Italian. Finally, we h ave extended them to include a wider range of entit i s, such as conjunctions.