Applying automatically parsed corpora to the study of language variation

In this work, we discuss the benefits of using automatically parsed corpora to study language variation. The study of language variation is an area of linguistics in which quantitative methods have been particularly successful. We argue that the large datasets that can be obtained using automatic annotation can help drive further research in this direction, providing sufficient data for the increasingly complex models used to describe variation. We demonstrate this by replicating and extending a previous quantitative variation study that used manually and semi-automatically annotated data. We show that while the study cannot be replicated completely due to limitations of the existing automatic annotation, we can draw at least the same conclusions as the original study. In addition, we demonstrate the flexibility of this method by extending the findings to related linguistic constructions and to another domain of text, using additional data.

[1]  Evie Coussé,et al.  Variabele werkwoordsvolgorde in de Nederlandse werkwoordelijke eindgroep: een taalgebruiksgebaseerd perspectief op de synchronie en diachronie van de zgn. rode en groene woordvolgorde , 2008 .

[2]  Susi Wurmbrand,et al.  West Germanic verb clusters: The empirical domain , 2004 .

[3]  Mona Arfs,et al.  Rood of groen? De interne woordvolgorde in tweeledige werkwoordelijke eindgroepen met een voltooid deelwoord en een hulpwerkwoord in bijzinnen , 2007 .

[4]  Evie Coussé,et al.  Motivaties voor volgordevariatie : een diachrone studie van werkwoordvolgorde in het Nederlands , 2008 .

[5]  R. Harald Baayen,et al.  Predicting the dative alternation , 2007 .

[6]  H. V. Riemsdijk,et al.  Verb Projection Raising, Scope, and the Typology of Rules Affecting Verbs , 1986 .

[7]  Gertjan van Noord,et al.  At Last Parsing Is Now Operational , 2006, JEPTALNRECITAL.

[8]  Frank Van Eynde,et al.  Large Scale Syntactic Annotation of Written Dutch: Lassy , 2013, Essential Speech and Language Technology for Dutch.

[9]  A. Evers The transformational cycle in Dutch and German , 1975 .

[10]  Arnold Ernest Evers,et al.  2. Verbal clusters and cluster creepers , 2003 .

[11]  Katalin É. Kiss,et al.  Verb Clusters : A Study of Hungarian, German and Dutch , 2004 .

[12]  Gert De Sutter,et al.  Rood, groen, corpus! Een taalgebruiksgebaseerde analyse van woordvolgordevariatie in tweeledige werkwoordelijke eindgroepen , 2005 .

[13]  Gosse Bouma,et al.  2 The Use of Om as Optional Complementizer , 2013 .

[14]  Gerold Schneider,et al.  Syntactic variation and lexical preference in the dative-shift alternation , 2012 .

[15]  Gertjan van Noord Huge Parsed Corpora in LASSY , 2008 .

[16]  Shalom Zuckerman,et al.  The acquisition of "optional" movement , 2001 .

[17]  Stefan Th. Gries,et al.  A Multifactorial Analysis of Syntactic Variation: Particle Movement Revisited , 2001, J. Quant. Linguistics.

[18]  Fred Weerman,et al.  Cracking the cluster: The acquisition of verb raising in Dutch , 2016 .

[19]  Jerold A. Edmondson,et al.  The Verbal Complex in Continental West Germanic , 1983 .

[20]  J.P.A. Stroop,et al.  Twee- en meerledige werkwoordsgroepen in gesproken Nederlands , 2009 .

[21]  E. F. K. Ko AMSTERDAM STUDIES IN THE THEORY AND HISTORY OF LINGUISTIC SCIENCE , 2006 .

[22]  Andreas Dufter,et al.  Towards a multivariate model of grammar: The case of word order variation in Dutch clause final verb clusters , 2009 .

[23]  C.J.W. Zwart,et al.  Verb Clusters in Continental West Germanic Dialects , 1996 .

[24]  Susi Wurmbrand,et al.  Verb Clusters, Verb Raising, and Restructuring , 2007 .

[25]  Gosse Bouma,et al.  Parsed Corpora for Linguistics , 2009 .