The Decathlon Model of Empirical Syntax

This article summarizes the findings of some of our studies of the data base of syntactic theory, contrasting the characteristics of frequency data and judgement data. Examination of frequency data reveals that the factors affecting its production interact competitively and probabilistically. This contrasts strongly with the patterns observed in judgement data, which point to a system in which violations of constraints produce negative weightings on form/meaning pairs. Since both data types are the result of human linguistic processing, we present a model of the architecture that such a system might have in order to produce such contrasting data. This Decathlon Model has two modules: Constraint Application and Output Selection. The first is blind, exceptionless and applies violation costs cumulatively (Keller 2000), the second is competitive and probabilistic. This constrains frameworks of syntactic explanation: an empirically adequate grammar must include gradient well-formedness, specify constraint violation costs, and distinguish between the application of rules and the selection of outputs. In this paper reports our investigations into the data base of syntactic theory, specifically addressing the similarities and differences between corpus data and judgements and sketching the implications for the construct of grammaticality and the architecture of the grammar which our findings have. The motivation for these studies was a dissatisfaction with the state of affairs in syntax, when, for example, two syntacticians can look at the same phenomenon and come up with widely differing analyses of what is going on. Another disappointment is the lack of any real forward movement in theory: alternative analyses seem to succeed each more due to fashion than due to falsification. We might say that syntactic description, let alone syntactic explanation, is underdetermined by its data base. In fact most data feeding into syntactic theory has significant flaws: it is fuzzy, it reflects multiple factors, only some of which are relevant to theory, and perhaps worst of all, these factors are poorly understood and not clearly

[1]  Christopher D. Manning,et al.  Probabilistic Syntax , 2002 .

[2]  Sam Featherston,et al.  Bridge verbs and V2 verbs – the same thing in spades? , 2004 .

[3]  A. Sorace,et al.  MAGNITUDE ESTIMATION OF LINGUISTIC ACCEPTABILITY , 1996 .

[4]  P. Smolensky,et al.  Optimality Theory: Constraint Interaction in Generative Grammar , 2004 .

[5]  S. Link,et al.  Bias in Quantifying Judgments , 1989 .

[6]  Sam Featherston,et al.  Empty Categories in Sentence Processing , 2001 .

[7]  Frank Keller,et al.  Using the Web to Overcome Data Sparseness , 2002, EMNLP.

[8]  Sam Featherston,et al.  Universals and grammaticality: wh-constraints in German and English , 2005 .

[9]  Christopher D. Manning,et al.  Soft Constraints Mirror Hard Constraints: Voice and Person in English and Lummi , 2002 .

[10]  Sam Featherston Coreferential objects in German: Experimental evidence on reflexivity , 2002 .

[11]  P. Boersma,et al.  Empirical Tests of the Gradual Learning Algorithm , 2001, Linguistic Inquiry.

[12]  Lisa McNair Papers from the Parasession on Theory and Data in Linguistics, April 11-13, 1996 , 1996 .

[13]  G. Müller Optimality, markedness, and word order in German , 1999 .

[14]  Carson T. Schütze The empirical base of linguistics: Grammaticality judgments and linguistic methodology , 1998 .

[15]  Frank Keller,et al.  Gradience in Grammar: Experimental and Computational Aspects of Degrees of Grammaticality , 2001 .

[16]  William Labov When Intuitions Fail , 2003 .