论文信息 - Syntactic Analysis in the Spoken Dutch Corpus (CGN)

Syntactic Analysis in the Spoken Dutch Corpus (CGN)

The paper describes the syntactic annotation of the Spoken Dutch Corpus (“Corpus Gesproken Nederlands” or CGN), the Dutch-Flemish project (1998-2003) aiming at the collection, description and annotation of ten million words of spoken Dutch. In the first part, the background of the parsing strategy is discussed, as well as some details concerning the actual implementation of the parsing process. The second part discusses some examples of practical applications of the result of the parsing process.

[1] W.J.M. Haeseryn. Algemene Nederlandse spraakkunst , 1997 .

[2] Walter Daelemans,et al. Lemmatisation and morphosyntactic annotation for the spoken Dutch corpus , 1999, CLIN.

[3] Thorsten Brants,et al. Cascaded Markov Models , 1999, EACL.

[4] A. van der Wouden,et al. Dat had niet zo gehoeven: Modaliteit en negatie in de nieuwe ANS , 1999 .

[5] Lou Boves,et al. Experiences from the Spoken Dutch Corpus Project , 2002, LREC.

[6] C. Pollard,et al. Center for the Study of Language and Information , 2022 .

[7] Nelleke Oostdijk,et al. The Spoken Dutch Corpus , 2000 .

[8] Douglas Biber,et al. Variation across speech and writing: Methodology , 1988 .

[9] Michael Moortgat,et al. Syntactic Annotation for the Spoken Dutch Corpus Project (CGN) , 2000, CLIN.

[10] A. van der Wouden,et al. Partikels: Naar een partikelwoordenboek voor het Nederlands , 2002 .

[11] Michael Moortgat,et al. CGN to Grail: Extracting a Type-logical Lexicon From the CGN Annotation , 2000, CLIN.

[12] Nelleke Oostdijk,et al. The Spoken Dutch Corpus. Overview and First Evaluation , 2000, LREC.

[13] Wojciech Skut,et al. An Annotation Scheme for Free Word Order Languages , 1997, ANLP.

[14] Frank Van Eynde. Part of Speech Tagging en Lemmatisering , 2003 .

[15] Geoffrey Nunberg,et al. The linguistics of punctuation , 1990 .

[16] J. D. Vries. Onze Nederlandse spreektaal , 2001 .