Syntactische annotatie voor het Corpus Gesproken Nederlands (CGN)

The paper discusses the syntactic annotation for the Spoken Dutch Corpus, a Dutch/Flemish cooperation project to build an annotated corpus of about one thousand hours of continuous speech, which amounts to 10 million words. After a brief introduction to the project, we discuss the kind of syntactic annotations we envisage (dependency structures) and the way they are created (semi-automatically). We mention some peculiarities of spoken language, and we finish with a discussion of some of the kinds of questions the corpus may help answering.

[1]  Michael Moortgat,et al.  Syntactic Annotation for the Spoken Dutch Corpus Project (CGN) , 2000, CLIN.

[2]  van den M.C. Toorn,et al.  Geschiedenis van de Nederlandse taal , 1997 .

[3]  A. van der Wouden,et al.  Partikels: Naar een partikelwoordenboek voor het Nederlands , 2002 .

[4]  Michael Moortgat,et al.  CGN to Grail: Extracting a Type-logical Lexicon From the CGN Annotation , 2000, CLIN.

[5]  Frank Van Eynde Part of Speech Tagging en Lemmatisering , 2003 .

[6]  Geoffrey Nunberg,et al.  The linguistics of punctuation , 1990 .

[7]  Nelleke Oostdijk,et al.  Building a corpus of spoken Dutch , 1999, CLIN.

[8]  Alekseĭ Timofeevich Krivonosov Die modalen Partikeln in der deutschen Gegenwartssprache , 1977 .

[9]  Jean-Pierre Martens,et al.  Orthographic Transcription of the Spoken Dutch Corpus , 2000, LREC.

[10]  Ineke Schuurman,et al.  On certain syntactic properties of spoken Dutch , 2001 .

[11]  J. D. Vries Onze Nederlandse spreektaal , 2001 .

[12]  Thorsten Brants,et al.  Cascaded Markov Models , 1999, EACL.

[13]  Gertjan van Noord,et al.  Alpino: Wide-coverage Computational Analysis of Dutch , 2000, CLIN.

[14]  Nelleke Oostdijk,et al.  Het Corpus Gesproken Nederlands , 1999 .

[15]  Wojciech Skut,et al.  An Annotation Scheme for Free Word Order Languages , 1997, ANLP.

[16]  Walter Daelemans,et al.  Lemmatisation and morphosyntactic annotation for the spoken Dutch corpus , 1999, CLIN.

[17]  Michael Moortgat,et al.  Syntactic Analysis in the Spoken Dutch Corpus (CGN) , 2002, LREC.

[18]  W.J.M. Haeseryn Algemene Nederlandse spraakkunst , 1997 .