Large Scale Syntactic Annotation of Written Dutch: Lassy

This chapter presents the Lassy Small and Lassy Large treebanks, as well as related tools and applications. Lassy Small is a corpus of written Dutch texts (1,000,000 words) which has been syntactically annotated with manual verification and correction. Lassy Large is a much larger corpus (over 500,000,000 words) which has been syntactically annotated fully automatically. In addition, various browse and search tools for syntactically annotated corpora have been developed and made available. Their potential for applications in corpus linguistics and information extraction has been illustrated and evaluated in a series of case studies.

[1]  Jennifer Spenader,et al.  Using Very Large Parsed Corpora and Judgment Data to Classify Verb Reflexivity , 2007, DAARC.

[2]  Walter Daelemans,et al.  An efficient memory-based morphosyntactic tagger and parser for Dutch , 2007, CLIN 2007.

[3]  van Gerardus Noord,et al.  Reinforcing Parser Preferences through Tagging , 2004 .

[4]  Gertjan van Noord Learning Efficient Parsing , 2009, EACL.

[5]  António Branco Anaphora: Analysis, Algorithms and Applications, 6th Discourse Anaphora and Anaphor Resolution Colloquium, DAARC 2007, Lagos, Portugal, March 29-30, 2007. Selected Papers , 2007, DAARC.

[6]  Nelleke Oostdijk,et al.  From D-Coi to SoNaR: a reference corpus for Dutch , 2008, LREC.

[7]  Martin Haspelmath,et al.  A frequentist explanation of some universals of reflexive marking , 2008 .

[8]  van Gerardus Noord EACL 2009. The 12th Conference of the European Chapter of the Association for Computational Linguistics , 2009 .

[9]  Jennifer Spenader,et al.  The Distribution of Weak and Strong Object Reflexives in Dutch , 2008 .

[10]  Petr Pajas,et al.  Recent Advances in a Feature-Rich Framework for Treebank Annotation , 2008, COLING.

[11]  Ineke Schuurman,et al.  LREC 2008. The sixth international conference on Language Resources and Evaluation , 2008 .

[12]  G. J. Bouma,et al.  Starting a sentence in Dutch , 2008 .

[13]  Nelleke Oostdijk,et al.  Het Corpus Gesproken Nederlands , 1999 .

[14]  Gertjan van Noord,et al.  At Last Parsing Is Now Operational , 2006, JEPTALNRECITAL.

[15]  Nathan Salmon,et al.  Reflexivity , 1986, Notre Dame J. Formal Log..

[16]  Catherine Lai,et al.  Querying and Updating Treebanks: A Critical Survey and Requirements Analysis , 2004, ALTA.

[17]  Michael Moortgat,et al.  Syntactische annotatie voor het Corpus Gesproken Nederlands (CGN) , 2002 .

[18]  Frank Van Eynde Part of Speech Tagging en Lemmatisering , 2003 .

[19]  Robert Malouf,et al.  Wide Coverage Parsing with Stochastic Attribute Value Grammars , 2004 .

[20]  Jennifer Spenader,et al.  Frequency-based constraints on reflexive forms in Dutch , 2008 .