PartTUT: The Turin University Parallel Treebank

In this paper, we introduce an ongoing project for the development of a parallel treebank for Italian, English and French. The treebank is annotated in a dependency format, namely the one designed in the Turin University Treebank (TUT), hence the choice to call such new resource Par(allel)TUT. The project aims at creating a resource which can be useful in particular for translation research. Therefore, beyond constantly enriching the treebank with new and heterogeneous data, so as to build a dynamic and balanced multilingual treebank, the current stage of the project is devoted to the design of a tool for the alignment of data, which takes into account syntactic knowledge as annotated in this kind of resource. The paper focuses in particular on the study of translational divergences and their implications for the development of the alignment tool. The paper provides an overview of the treebank, with its current content and the peculiarities of the annotation format, the description of the classes of translational divergences which could be encountered in the treebank, together with a proposal for their alignment.

[1]  Sebastian Riedel,et al.  The CoNLL 2007 Shared Task on Dependency Parsing , 2007, EMNLP.

[2]  Robert C. Moore Fast and accurate sentence alignment of bilingual corpora , 2002, AMTA.

[3]  Andy Way,et al.  Automatic Generation of Parallel Treebanks , 2008, COLING.

[4]  Sabine Buchholz,et al.  CoNLL-X Shared Task on Multilingual Dependency Parsing , 2006, CoNLL.

[5]  Jean-Paul Vinay,et al.  Comparative stylistics of French and English : a methodology for translation , 1995 .

[6]  Daniel Gildea,et al.  An Algorithm for Word-Level Alignment of Parallel Dependency Trees1 , 2003 .

[7]  Alon Lavie,et al.  Syntax-Driven Learning of Sub-Sentential Translation Equivalents and Translation Rules from Parsed Parallel Corpora , 2008, SSST@ACL.

[8]  Petya Osenova,et al.  Bulgarian-English Parallel Treebank: Word and Semantic Level Alignment , 2011 .

[9]  Yuan Ding,et al.  Automatic Learning of Parallel Dependency Treelet Pairs , 2004, IJCNLP.

[10]  Tomaz Erjavec,et al.  The JRC-Acquis: A Multilingual Aligned Parallel Corpus with 20+ Languages , 2006, LREC.

[11]  Sadao Kurohashi,et al.  Bayesian Subtree Alignment Model based on Dependency Trees , 2011, IJCNLP.

[12]  Cristina Bosco,et al.  Exploiting catenae in a parallel treebank alignment , 2014, LREC.

[13]  Cristina Bosco,et al.  The EVALITA Dependency Parsing Task: From 2007 to 2011 , 2011, EVALITA.

[14]  Sylwia Ozdowska Using Bilingual Dependencies to Align Words in English/French Parallel Corpora , 2005, ACL.

[15]  Michael T. Putnam,et al.  Catenae: Introducing a Novel Unit of Syntactic Analysis , 2012 .

[16]  Andy Way,et al.  Capturing translational divergences with a statistical tree-to-tree aligner , 2007 .

[17]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[18]  Koenraad De Smedt,et al.  Linguistically motivated parallel parsebanks , 2009 .

[19]  Dan Flickinger,et al.  Minimal Recursion Semantics: An Introduction , 2005 .

[20]  J. C. Catford,et al.  A linguistic theory of translation : an essay in applied linguistics , 1965 .

[21]  Simonetta Montemagni,et al.  Converting Italian Treebanks: Towards an Italian Stanford Dependency Treebank , 2013, LAW@ACL.

[22]  Arul Menezes,et al.  A best-first alignment algorithm for automatic extraction of transfer mappings from bilingual corpora , 2001, DDMMT@ACL.

[23]  Cristina Bosco,et al.  Looking Back to the EVALITA Constituency Parsing Task: 2007-2011 , 2011, EVALITA.

[24]  Philipp Koehn,et al.  Europarl: A Parallel Corpus for Statistical Machine Translation , 2005, MTSUMMIT.

[25]  Heidi Fox,et al.  Phrasal Cohesion and Statistical Machine Translation , 2002, EMNLP.

[26]  Zdeněk Žabokrtský,et al.  Automatic alignment of Czech and English deep syntactic dependency trees , 2008, EAMT.

[27]  Yanjun Ma,et al.  Improving Word Alignment Using Syntactic Dependencies , 2008, SSST@ACL.

[28]  Jörg Tiedemann,et al.  Building a Large Machine-Aligned Parallel Treebank , 2009 .

[29]  Lea Cyrus,et al.  Building a resource for studying translation shifts , 2006, LREC.

[30]  Christopher D. Manning,et al.  The Stanford Typed Dependencies Representation , 2008, CF+CDPE@COLING.

[31]  Denise Merkle Jean-Paul Vinay and Jean Darbelnet. Comparative Stylistics of French and English, trans. and ed. by Juan C. Sager and M.-J. Hamel. Amsterdam/Philadelphia, John Benjamins, 1995. , 1996 .