Dependency structure annotation in the IULA Spanish LSP Treebank

This paper presents the IULA Spanish LSP Treebank, an open-source treebank of over 40,000 sentences, developed in the framework of the European project METANET4U. The IULA Spanish LSP Treebank is the first technical corpus of Spanish annotated at surface syntactic level, following the dependency grammar theory. We present the method we used to create the resource and the linguistic annotations that the treebank provides, using examples and comparing with similar resources. We also provide the statistics of the treebank and the evaluation results.

[1]  Mariona Taulé,et al.  AnCora: Multilevel Annotated Corpora for Catalan and Spanish , 2008, LREC.

[2]  Srinivas Bangalore Localizing Dependencies and Supertagging , 2005 .

[3]  Hiroshi Maruyama,et al.  Structural Disambiguation With Constraint Propagation , 1990, ACL.

[4]  Saso Dzeroski,et al.  Towards a Slovene Dependency Treebank , 2006, LREC.

[5]  Anne Abeillé,et al.  Treebanks: Building and Using Parsed Corpora , 2003 .

[6]  João Graça,et al.  Developing a Deep Linguistic Databank Supporting a Collection of Treebanks: the CINTIL DeepGramBank , 2010, LREC.

[7]  Dan Flickinger,et al.  Minimal Recursion Semantics: An Introduction , 2005 .

[8]  Timo Järvinen,et al.  Towards an implementable dependency grammar , 1998, ArXiv.

[9]  Stephan Oepen,et al.  Parser engineering and performance profiling , 2000, Natural Language Engineering.

[10]  Dilek Z. Hakkani-Tür,et al.  Building a Turkish Treebank , 2003 .

[11]  Joakim Nivre,et al.  MaltParser: A Language-Independent System for Data-Driven Dependency Parsing , 2007, Natural Language Engineering.

[12]  Gerald Penn,et al.  Book Review , 2003, Computational Linguistics.

[13]  Ralph Grishman,et al.  A Treebank of Spanish and its Application to Parsing , 2000, LREC.

[14]  Stephan Oepen,et al.  Stochastic HPSG Parse Disambiguation using the Redwoods Corpus , 2005 .

[15]  Harry Bunt,et al.  Advances in Probabilistic and Other Parsing Technologies , 2000 .

[16]  Joakim Nivre,et al.  Dependency Parsing , 2009, Lang. Linguistics Compass.

[17]  Jason Eisner,et al.  Bilexical Grammars and their Cubic-Time Parsing Algorithms , 2000 .

[18]  Koby Crammer,et al.  Online Large-Margin Training of Dependency Parsers , 2005, ACL.

[19]  Daniel Zeman,et al.  Coordination Structures in Dependency Treebanks , 2013, ACL.

[20]  János Csirik,et al.  Hungarian Dependency Treebank , 2010, LREC.

[21]  Ivan A. Sag,et al.  Book Reviews: Head-driven Phrase Structure Grammar and German in Head-driven Phrase-structure Grammar , 1996, CL.

[22]  Mary P. Harper,et al.  Extensions to constraint dependency parsing for spoken language processing , 1995, Comput. Speech Lang..

[23]  Jorge Vivaldi Palatresi Corpus and exploitation tool: IULACT and bwanaNet , 2009 .

[24]  Wolfgang Lezius,et al.  TIGER: Linguistic Interpretation of a German Corpus , 2004 .

[25]  Jonas Kuhn,et al.  Making Ellipses Explicit in Dependency Conversion for a German Treebank , 2012, LREC.

[26]  Jan Hajic,et al.  The Prague Dependency Treebank , 2003 .

[27]  Muntsa Padró,et al.  Finding Dependency Parsing Limits over a Large Spanish Corpus , 2013, IJCNLP.

[28]  Michael A. Covington,et al.  A Fundamental Algorithm for Dependency Parsing , 2004 .

[29]  Kemal Oflazer Dependency Parsing with an Extended Finite-State Approach , 2003, Computational Linguistics.

[30]  Joakim Nivre,et al.  Universal Dependency Annotation for Multilingual Parsing , 2013, ACL.

[31]  Yi Zhang,et al.  Annotating Wall Street Journal Texts Using a Hand-Crafted Deep Linguistic Grammar , 2009, Linguistic Annotation Workshop.

[32]  Lluís Padró,et al.  FreeLing 3.0: Towards Wider Multilinguality , 2012, LREC.

[33]  Joakim Nivre,et al.  Graph Transformations in Data-Driven Dependency Parsing , 2006, ACL.

[34]  Ivan A. Sag,et al.  Information-based syntax and semantics , 1987 .

[35]  Joakim Nivre,et al.  Memory-Based Dependency Parsing , 2004, CoNLL.

[36]  M. Teresa Cabré,et al.  10 anys del Corpus de l'IULA , 2006 .

[37]  David M. Carter,et al.  The TreeBanker: a Tool for Supervised Training of Parsed Corpora , 1997, ArXiv.

[38]  Prashanth Mannem,et al.  Empty Categories in Hindi Dependency Treebank: Analysis and Recovery , 2011, Linguistic Annotation Workshop.

[39]  Peter Hellwig,et al.  Dependency Unification Grammar , 1986, COLING.

[40]  Yuji Matsumoto,et al.  Statistical Dependency Analysis with Support Vector Machines , 2003, IWPT.

[41]  Montserrat Marimon,et al.  The Spanish DELPH-IN grammar , 2012, Language Resources and Evaluation.

[42]  Tuomo Kakkonen Dependency treebanks: methods, annotation schemes and tools , 2005, NODALIDA.

[43]  Michael Collins,et al.  A Statistical Parser for Czech , 1999, ACL.

[44]  Fred Karlsson,et al.  Constraint Grammar as a Framework for Parsing Running Text , 1990, COLING.

[45]  Montserrat Marimon The Tibidabo Treebank , 2010, Proces. del Leng. Natural.

[46]  Sabine Buchholz,et al.  CoNLL-X Shared Task on Multilingual Dependency Parsing , 2006, CoNLL.

[47]  Ann Copestake,et al.  Implementing typed feature structure grammars , 2001, CSLI lecture notes series.

[48]  Francis Bond,et al.  Semi-automatic documentation of an implemented linguistic grammar augmented with a treebank , 2008, Lang. Resour. Evaluation.

[49]  Lucien Tesnière Éléments de syntaxe structurale , 1959 .

[50]  Alexis Nasr,et al.  A Simple String-Rewriting Formalism for Dependency Grammar , 2004, Workshop On Recent Advances In Dependency Grammar.

[51]  Wolfgang Menzel,et al.  Decision Procedures for Dependency Parsing Using Graded Constraints , 1998 .

[52]  Jason Eisner,et al.  Three New Probabilistic Models for Dependency Parsing: An Exploration , 1996, COLING.

[53]  Wen Wang,et al.  A Statistical Constraint Dependency Grammar (CDG) Parser , 2004 .

[54]  Richard Hudson,et al.  English word grammar , 1995 .

[55]  Roberto Basili,et al.  Building the Italian Syntactic-Semantic Treebank , 2003 .

[56]  Prashanth Mannem,et al.  The ICON-2010 tools contest on Indian language dependency parsing , 2010 .

[57]  Gertjan van Noord,et al.  The Alpino Dependency Treebank , 2001, CLIN.

[58]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[59]  Atro Voutilainen,et al.  A language-independent system for parsing unrestricted text , 1995 .

[60]  Christer Samuelsson,et al.  A Statistical Theory of Dependency Syntax , 2000, COLING.

[61]  Stephan Oepen,et al.  LinGO Redwoods , 2004 .

[62]  Joakim Nivre,et al.  MAMBA Meets TIGER: Reconstructing a Swedish Treebank from Antiquity , 2005 .

[63]  Montserrat Marimon,et al.  Automatic Selection of HPSG-Parsed Sentences for Treebank Construction , 2014, Computational Linguistics.

[64]  Igor Mel’čuk,et al.  Dependency Syntax: Theory and Practice , 1987 .

[65]  Khalil Sima'an,et al.  Data-Oriented Parsing , 2003 .

[66]  Cristina Bosco,et al.  Building a Treebank for Italian: a Data-driven Annotation Schema , 2000, LREC.