The IULA Spanish LSP Treebank

This paper presents the IULA Spanish LSP Treebank, a dependency treebank of over 41,000 sentences of different domains (Law, Economy, Computing Science, Environment, and Medicine), developed in the framework of the European project METANET4U. Dependency annotations in the treebank were automatically derived from manually selected parses produced by an HPSG-grammar by a deterministic conversion algorithm that used the identifiers of grammar rules to identify the heads, the dependents, and some dependency types that were directly transferred onto the dependency structure (e.g., subject, specifier, and modifier), and the identifiers of the lexical entries to identify the argument-related dependency functions (e.g. direct object, indirect object, and oblique complement). The treebank is accessible with a browser that provides concordance-based search functions and delivers the results in two formats: (i) a column-based format, in the style of CoNLL-2006 shared task, and (ii) a dependency graph, where dependency relations are noted by an oriented arrow which goes from the dependent node to the head node. The IULA Spanish LSP Treebank is the first technical corpus of Spanish annotated at surface syntactic level following the dependency grammar theory. The treebank has been made publicly and freely available from the META-SHARE platform with a Creative Commons CC-by licence.

[1]  Montserrat Marimon The Spanish DELPHIN Grammar , 2012 .

[2]  G. Horrocks,et al.  Information-based syntax and semantics. Volume I: Fundamentals: Carl Pollard and Ivan A. Sag, (CSLI Lecture Notes Series No. 13). Stanford, CA: Centre for the Study of Language and Information. 1987. x+233 pp. , 1990 .

[3]  Montserrat Marimon The Tibidabo Treebank , 2010, Proces. del Leng. Natural.

[4]  João Graça,et al.  Developing a Deep Linguistic Databank Supporting a Collection of Treebanks: the CINTIL DeepGramBank , 2010, LREC.

[5]  Stephan Oepen,et al.  Stochastic HPSG Parse Disambiguation using the Redwoods Corpus , 2005 .

[6]  Emily M. Bender,et al.  Rapid Prototyping of Scalable Grammars: Towards Modularity in Extensions to a Language-Independent Core , 2005, IJCNLP.

[7]  António Branco,et al.  ParDeepBank : Multiple Parallel Deep Treebanking , 2012 .

[8]  Muntsa Padró,et al.  Finding Dependency Parsing Limits over a Large Spanish Corpus , 2013, IJCNLP.

[9]  Yi Zhang,et al.  Annotating Wall Street Journal Texts Using a Hand-Crafted Deep Linguistic Grammar , 2009, Linguistic Annotation Workshop.

[10]  Ivan A. Sag,et al.  Information-based syntax and semantics , 1987 .

[11]  Jorge Vivaldi Palatresi Corpus and exploitation tool: IULACT and bwanaNet , 2009 .

[12]  M. Teresa Cabré,et al.  10 anys del Corpus de l'IULA , 2006 .

[13]  Montserrat Marimon,et al.  The Spanish DELPH-IN grammar , 2012, Language Resources and Evaluation.

[14]  Stephan Oepen,et al.  LinGO Redwoods , 2004 .

[15]  David M. Carter,et al.  The TreeBanker: a Tool for Supervised Training of Parsed Corpora , 1997, ArXiv.

[16]  Treebanks Treebanks Building and Using Parsed Corpora , 2011 .

[17]  Antske Fokkens,et al.  Grammar Customization , 2010 .

[18]  Sabine Buchholz,et al.  CoNLL-X Shared Task on Multilingual Dependency Parsing , 2006, CoNLL.

[19]  Ann Copestake,et al.  Implementing typed feature structure grammars , 2001, CSLI lecture notes series.

[20]  Francis Bond,et al.  Semi-automatic documentation of an implemented linguistic grammar augmented with a treebank , 2008, Lang. Resour. Evaluation.

[21]  Montserrat Marimon,et al.  The IULA Treebank , 2012, LREC.

[22]  Ralph Grishman,et al.  A Treebank of Spanish and its Application to Parsing , 2000, LREC.

[23]  Ivan A. Sag,et al.  Book Reviews: Head-driven Phrase Structure Grammar and German in Head-driven Phrase-structure Grammar , 1996, CL.

[24]  Igor Mel’čuk,et al.  Dependency Syntax: Theory and Practice , 1987 .

[25]  Dan Flickinger,et al.  Minimal Recursion Semantics: An Introduction , 2005 .

[26]  Mariona Taulé,et al.  AnCora: Multilevel Annotated Corpora for Catalan and Spanish , 2008, LREC.

[27]  Christopher D. Manning,et al.  LinGO Redwoods A Rich and Dynamic Treebank for HPSG , 2002 .