Natural Language Processing Across Time: An Empirical Investigation on Italian

In this paper, we study how existing natural language processing tools for Italian perform on ancient texts. The first goal is to understand to what extent such tools can be used "as they are" for the automatic analysis of old literary works. Indeed, while NLP tools for Italian achieve today good performance, it is not clear if they could be successfully used for the humanities, to support the critical study of historical works. Our analysis will show how tools' performance systematically vary across different time periods, and within literary movements. As a second goal, we want to verify whether or not simple customization methods can improve the tools performance over the old works.

[1]  Vitor ROCIO,et al.  ATALA 59 AUTOMATED CREATION OF A PARTIALLY SYNTACTICALLY ANNOTATED CORPUS OF MEDIEVAL PORTUGUESE USING CONTEMPORARY PORTUGUESE RESOURCES , 1999 .

[2]  昌明 神谷,et al.  中英語に現れる小節・結果構文-Penn-Helsinki Parsed Corpus of Middle English Second Editionを検索して- , 2010 .

[3]  Paolo Squillacioti Il Tesoro della Lingua Italiana delle Origini , 2003 .

[4]  Michael Collins,et al.  Head-Driven Statistical Models for Natural Language Parsing , 2003, CL.

[5]  Eugene Charniak,et al.  A Maximum-Entropy-Inspired Parser , 2000, ANLP.

[6]  David Yarowsky,et al.  Inducing Multilingual POS Taggers and NP Bracketers via Robust Projection Across Aligned Corpora , 2001, NAACL.

[7]  Charlotte Galves,et al.  ( Campinas ) Computational and linguistic aspects of the construction of the Tycho Brahe Parsed Corpus of Historical Portuguese , 2008 .

[8]  Eric Brill,et al.  Transformation-Based Error-Driven Learning and Natural Language Processing: A Case Study in Part-of-Speech Tagging , 1995, CL.

[9]  Jason Baldridge,et al.  Part-of-Speech Tagging for Middle English through Alignment and Projection of Parallel Diachronic Texts , 2007, EMNLP-CoNLL.

[10]  Susan Pintzuk,et al.  The York-Toronto-Helsinki Parsed Corpus of Old English , 2003 .

[11]  Roberto Basili,et al.  Parsing engineering and empirical robustness , 2002, Natural Language Engineering.

[12]  R. Basili,et al.  Automatic Analysis and Annotation of Literary Texts , 2005 .

[13]  C. M. Sperberg-McQueen,et al.  Guidelines for electronic text encoding and interchange , 1994 .