Evaluation of Natural Language Tools for Italian: EVALITA 2007

EVALITA 2007, the first edition of the initiative d evoted to the evaluation of Natural Language Proces sing tools for Italian, provided a shared framework where participants’ systems had the possibility to be evaluated on five different tas ks, namely Part of Speech Tagging (organised by the University of Bologna), P arsing (organised by the University of Torino), Wor d Sense Disambiguation (organised by CNR-ILC, Pisa), Temporal Expression Recognition and Normalization (organised by CELCT, Trento), and Named Entity Recognition (organised by FBK, Trento). We believe that the diffusion of shared tasks and s hared evaluation practices is a crucial step toward s the development of resources and tools for Natural Language Processing. Experiences of this kind, in fact, are a valuable contribution to the validation of existing models and data, allowing for consistent comparisons among approaches and among representation schemes. The good response obtained by EVALITA, both in the number of participants and in the quality of results, showed that pursuing such g oals is feasible not only for English, but also for other languages.

[1]  Inderjeet Mani,et al.  2003 Standard for the Annotation of Temporal Expressions , 2004 .

[2]  Monica Monachini ELM-IT: EAGLES Specifications for Italian morphosyntax Lexicon Specification and Classification Guidelines , 1996 .

[3]  Maria Teresa Pazienza,et al.  The syntactic-semantic treebank of Italian. An overview , 2003 .

[4]  Michael Collins,et al.  A Statistical Parser for Czech , 1999, ACL.

[5]  Walter Daelemans,et al.  MBT: A Memory-Based Part of Speech Tagger-Generator , 1996, VLC@COLING.

[6]  Antonietta Alonge,et al.  "ItalWordNet" : Building a Large Semantic Database for the Automatic Treatment of Italian , 1998 .

[7]  Eric Brill,et al.  Some Advances in Transformation-Based Part of Speech Tagging , 1994, AAAI.

[8]  Rodolfo Delmonte,et al.  Strutture sintattiche dall'analisi computazionale di corpora di italiano , 2004 .

[9]  Cristina Bosco,et al.  Evalita parsing task: an analysis of the first parsing system contest for Italian , 2007 .

[10]  Cristina Bosco,et al.  Building a Treebank for Italian: a Data-driven Annotation Schema , 2000, LREC.

[11]  Daniel Gildea,et al.  Corpus Variation and Parser Performance , 2001, EMNLP.

[12]  Erik F. Tjong Kim Sang,et al.  Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition , 2003, CoNLL.

[13]  Patrick Paroubek,et al.  The GRACE french part-of-speech tagging evaluation task , 1998, LREC.

[14]  Giorgio Satta,et al.  Comparing Italian parsers on a common Treebank: the EVALITA experience , 2008, LREC.

[15]  Emanuele Pianta,et al.  Annotazione di contenuti concettuali in un corpus italiano: I-CAB , 2006 .

[16]  Emanuele Pianta,et al.  Italian Content Annotation Bank (I-CAB): Temporal Expressions (v. 1.0) , 2005 .

[17]  Adwait Ratnaparkhi,et al.  A Maximum Entropy Model for Part-Of-Speech Tagging , 1996, EMNLP.

[18]  Marisa Ulivieri,et al.  Senseval-3: The Italian all-words task , 2004, SENSEVAL@ACL.

[19]  Sebastian Riedel,et al.  The CoNLL 2007 Shared Task on Dependency Parsing , 2007, EMNLP.

[20]  Thorsten Brants,et al.  TnT – A Statistical Part-of-Speech Tagger , 2000, ANLP.

[21]  Fabio Tamburini,et al.  EVALITA 2007 THE ITALIAN PART-OF-SPEECH TAGGING EVALUATION - TASK GUIDELINES - , 2007 .

[22]  Manuela Speranza The Named Entity Recognition Task at EVALITA 2009 , 2009 .

[23]  Giorgio Satta,et al.  Analyzing an Italian Treebank with State-of-the-Art Statistical Parsers , 2004 .