ParGramBank: The ParGram Parallel Treebank

This paper discusses the construction of a parallel treebank currently involving ten languages from six language families. The treebank is based on deep LFG (LexicalFunctional Grammar) grammars that were developed within the framework of the ParGram (Parallel Grammar) effort. The grammars produce output that is maximally parallelized across languages and language families. This output forms the basis of a parallel treebank covering a diverse set of phenomena. The treebank is publicly available via the INESS treebanking environment, which also allows for the alignment of language pairs. We thus present a unique, multilayered parallel treebank that represents more and different types of languages than are available in other treebanks, that represents deep linguistic knowledge and that allows for the alignment of sentences at several levels: dependency structures, constituency structures and POS information.

[1]  J. Zwart The Minimalist Program , 1998, Journal of Linguistics.

[2]  Daniel Gildea,et al.  The Proposition Bank: An Annotated Corpus of Semantic Roles , 2005, CL.

[3]  Stefan Riezler,et al.  EXPLOITING F-STRUCTURE INPUT FOR SENTENCE CONDENSATION , 2004 .

[4]  M. Baltin,et al.  The Mental representation of grammatical relations , 1985 .

[5]  James Pustejovsky,et al.  Temporal Discourse Models for Narrative Structure , 2004, ACL 2004.

[6]  Joakim Nivre,et al.  The English-Swedish-Turkish Parallel Treebank , 2010, LREC.

[7]  Miriam Butt The Structure of Complex Predicates in Urdu , 1995 .

[8]  Cristina Bosco,et al.  Building the multilingual TUT parallel treebank , 2011 .

[9]  J. Bresnan Lexical-Functional Syntax , 2000 .

[10]  Emily M. Bender,et al.  Shared Representation in Multilingual Grammar Engineering , 2005 .

[11]  Koenraad De Smedt,et al.  LFG Parsebanker: A Tool for Building and Searching a Treebank as a Parsed Corpus , 2008 .

[12]  Sebastian Sulger A Parallel Analysis of have-Type Copular Constructions in two have-Less Indo-European Languages , 2011 .

[13]  Daniel G. Bobrow,et al.  Precision-focused Textual Inference , 2007, ACL-PASCAL@ACL.

[14]  Michael Jellinghaus,et al.  Multilingual parallel treebanking: a lean and flexible approach , 2006, LREC.

[15]  Natalia Klyueva,et al.  Towards Parallel Czech-Russian Dependency Treebank , 2010 .

[16]  Lorna Balkan,et al.  TSNLP - Test Suites for Natural Language Processing , 1996, COLING.

[17]  Cristina Bosco,et al.  The Parallel-TUT: a multilingual and multiformat treebank , 2012, LREC.

[18]  Josef van Genabith,et al.  Direct and Underspecified Interpretations of LFG f-structures , 1996, COLING.

[19]  Miriam Butt,et al.  The Parallel Grammar Project , 2002, COLING 2002.

[20]  Koenraad De Smedt,et al.  Linguistically motivated parallel parsebanks , 2009 .

[21]  Tibor Laczkó,et al.  ON THE ( UN ) BEARABLE LIGHTNESS OF BEING AN LFG STYLE COPULA IN HUNGARIAN , 2012 .

[22]  Miriam Butt,et al.  A grammar writer's cookbook , 1999 .

[23]  Koenraad De Smedt,et al.  An Open Infrastructure for Advanced Treebanking , 2013 .

[24]  Louisa Sadler,et al.  Verbless Clauses: Revealing the Structure Within , 2007 .

[25]  Miriam Butt,et al.  Writing Large-Scale Parallel Grammars For English, French, And German , 1999 .

[26]  António Branco,et al.  ParDeepBank : Multiple Parallel Deep Treebanking , 2012 .

[27]  Sarmad Hussain,et al.  Transliterating Urdu for a Broad-Coverage Urdu/Hindi LFG Grammar , 2010, LREC.

[28]  Mary Dalrymple,et al.  Lexical Functional Grammar , 2001 .

[29]  Mohammed Attia,et al.  A UNIFIED ANALYSIS OF COPULA CONSTRUCTIONS IN LFG , 2008 .

[30]  Miriam Butt,et al.  The Feature Space in Parallel Grammar Writing , 2005, Research on Language and Computation.

[31]  Regina Barzilay,et al.  Selective Sharing for Multilingual Dependency Parsing , 2012, ACL.

[32]  Zdenek Zabokrtský,et al.  TectoMT: Modular NLP Framework , 2010, IceTAL.

[33]  Noam Chomsky Lectures on Government and Binding: The Pisa Lectures , 1993 .