Harvesting Dutch Trees: Syntactic Properties of Spoken Dutch

In this paper, we report on quantitative research into certain word order phenomena in Dutch. In our research, we use the Spoken Dutch Corpus (CGN), a major new resource for research into contemporary spoken Dutch. After briefly introducing the primary data, the annotations added, and some of the tools to explore the primary data and the annotations, we illustrate how the Corpus may be utilized to answer certain linguistic questions concerning the Dutch language.

[1]  Walter Daelemans,et al.  Lemmatisation and morphosyntactic annotation for the spoken Dutch corpus , 1999, CLIN.

[2]  Michael Moortgat,et al.  CGN Syntactische Annotatie. Versie januari 2002 , 2002 .

[3]  Marc Swerts,et al.  Annotation of prominent words, prosodic boundaries and segmental lengthening by non-expert transcribers in the Spoken Dutch Corpus , 2002, LREC.

[4]  Michael Moortgat,et al.  Syntactische annotatie voor het Corpus Gesproken Nederlands (CGN) , 2002 .

[5]  Jan-Wouter Zwart,et al.  Dutch syntax: A minimalist approach , 1993 .

[6]  Charles N. Li,et al.  Subject and topic , 1979 .

[7]  Dik Bakker,et al.  Formal and Computational Aspects of Functional Grammar and Language Typology , 1994 .

[8]  Henk C. van Riemsdijk,et al.  A case study in syntactic markedness: The binding nature of prepositional phrases , 1978 .

[9]  Kris Demuynck,et al.  Automatic generation of phonetic transcriptions for large speech corpora , 2002, INTERSPEECH.

[10]  Ineke Schuurman,et al.  CGN, an annotated corpus of spoken Dutch , 2003, LINC@EACL.

[11]  Jean-Pierre Martens,et al.  Orthographic Transcription of the Spoken Dutch Corpus , 2000, LREC.

[12]  H. J. Bennis Gaps and Dummies , 1987 .

[13]  Michael Moortgat,et al.  Syntactic Annotation for the Spoken Dutch Corpus Project (CGN) , 2000, CLIN.

[14]  Nelleke Oostdijk,et al.  The Spoken Dutch Corpus and its Exploitation Environment , 2003, LINC@EACL.

[15]  John A. Hawkins,et al.  A Performance Theory of Order and Constituency , 1995 .

[16]  E. Gibson Linguistic complexity: locality of syntactic dependencies , 1998, Cognition.

[17]  Nelleke Oostdijk,et al.  Building a corpus of spoken Dutch , 1999, CLIN.

[18]  A. van der Wouden,et al.  Collocation: Computational Extraction, Analysis and Exploitation , 2001 .

[19]  Marcel den Dikken,et al.  Particles: On the Syntax of Verb-Particle, Triadic and Causative Constructions , 1995 .

[20]  A. van der Wouden Collocational behaviour in non content words , 2001 .

[21]  W.J.M. Haeseryn Algemene Nederlandse spraakkunst , 1997 .

[22]  Nelleke Oostdijk,et al.  The Spoken Dutch Corpus. Overview and First Evaluation , 2000, LREC.

[23]  Ton van der Wouden Particle research meets corpus linguistics: on the collocational behavior of particles , 2002 .