Graded Constraints on English Word Forms

We present evidence that graded constraints determine the occurrence rates of the different rhyme types found in the ensemble of simple uninflected words in the English language. The rhyme types are defined in terms of vowel length (long vs. short), presence of particular post-vocalic elements, and their place of articulation. The rhyme types in the corpus (uninflected monosyllabic lemmas found in CELEX, which uses Southern British ‘Received Pronunciation’) conform to a template defined by a small number of absolute or categorical constraints. Among those forms consistent with the template, several graded constraints are identified, including constraints favoring short vowels, fewer segments, coronal places of articulation, and, when stops are present, absence of voicing. Such constraints induce a partial ordering over the expected rates of occurrence of different rhyme types; these have as special cases a pattern of implications for whether or not a form occurs at all (if form X occurs, then form Y should occur; if Z does not occur, then W should not occur). The constraints can be incorporated into a monotonic function characterizing the expected frequencies of occurrence of different rhyme types. Observed occurrence rates are better explained by a linear accumulation of constraints than by a multiplicative accumulation function. We also find that the constraints favoring coronals and short vowels are amplified when combined with other constraints, and are stronger in words of higher token frequency. McClelland & Vander Wyk 2 One goal of linguistic inquiry is to characterize the pattern of occurrence of linguistic forms. It has been common to treat the occurrence of candidate forms of a given type as an all-or-nothing affair: a form-type either is or is not acceptable, and so forms of the given type either can or cannot occur. Under this approach, the theorist attempts to provide a formal system, consisting of a structural framework and a set of rules or constraints on possible forms, which makes it possible to provide a perspicuous account of which types of forms are acceptable and which are not. The constructs used within the framework (e.g., the rules or constraints) operate in a categorical fashion: They rule candidates in or out, without allowing for graded differences in the relative degree of acceptability of different forms. For example, Harris (1994)provides a structural framework for characterizing the rhymes of English word-final stressed syllables and a set of statements cast within that framework that specify constraints on which rhymes can occur. One such statement is the following: “for rhymes of the form /VVlX/, where X is a stop, X must be coronal.” This accounts for the non-occurrence of rhymes like /i:lg/ as in ‘fielg’ while allowing rhymes like /i:ld/ as in ‘field’ (the notation VV refers to any of a set of English vowels that Harris calls ‘long’, among which /i:/ is included). In the present article, we make the case that considering only whether a form can or cannot occur ignores systematic facts about language, namely facts about differences in rates of occurrence of different forms. Indeed, we suggest that forms have a graded tendency to occur, conceived as a continuous underlying variable. We base our argument on data from a small subdomain, namely the rhymes occurring in monomorphemic monosyllabic word lemmas found in Southern British English. We rely on a count of the number of words containing rhymes of each possible type, based on the CELEX lemma corpus (Baayen, Piepenbrock, & van Rijn, 1993), with proper nouns and forms of questionable morphological status (such as first and stealth) removed. We present evidence that there are systematic gradations in the observed rates of occurrence of different types of rhymes, strongly suggesting graded differences among forms in their underlying acceptability or tendency to occur. We then examine the factors that appear to govern the pattern of variation we see in the rate of occurrence of different forms. We find that much of the structure in these patterns can be accounted for by positing a small number of graded constraints. The constraints create strong implications for the relative rates of occurrence of candidate forms; the binary categorization of forms into those that do occur vs. those that do not occur are seen as consequences of the cumulative impact of the graded constraints. We begin with an informal inspection of a subset of the data. Table 1 shows the average rate of occurrence of several types of rhymes containing at least one stop consonant in the corpus. Each rhyme type encompasses the set of rhymes all containing the indicted set of coda consonants and one of a set of vowels classified (following Harris, 1994) as ‘long’ or ‘short’. Although no two vowels are strictly equivalent, a McClelland & Vander Wyk 3 division into long (VV) and short (V) vowels has been shown to be useful in capturing a number of features of English rhymes (e.g. (Hammond, 1999; Harris, 1994). The table distinguishes between rhymes on the basis of vowel length, voicing of the stop and any other obstruents, and the place of articulation of the coda stop. For each combination of these variables, the rate of occurrence when the stop occurs alone is given in the first column, followed by its rate of occurrence with what we will call an embellishment, either a pre-stop liquid (designated l_), homorganic nasal (n_), coronal fricative (s_ or z_) or a post-stop coronal fricative (_s or _z) or a second coronal stop (_t or _d). The rates of occurrence are given as the average number of words with the given coda per vowel of the indicated type. Thus, the entry for the unvoiced coda /t/ occurring alone with a short vowel is 22.6, indicating that rhymes of this type, namely /Vt/, occur 22.6 times per short vowel in our corpus of monosyllabic English lemmas. There are several things apparent in the table. First, there is a wide range of variation in the rate of occurrence of different forms. The rhyme /Vt/ occurs 22.6 times per vowel, whereas the rhyme /VVks/ occurs only 0.2 times per vowel (in the two words ‘hoax’ and ‘coax’). This emphasizes the fact that taking notice only of binary distinctions between forms that do and do not occur would miss a hundred-fold range of variation in the rates of occurrence of different rhyme types. Second, the variation in rate of occurrence is systematic. For example, both coda voicing and vowel length have an effect on how often a given rhyme is used: holding other factors constant, voiced codas and long vowels tend to result in lower rates of occurrence. In addition, holding other factors equal, coronal consonants tend to occur at higher rates than their non-coronal counterparts. The presence of any one of the indicated embellishments also tends to reduce the rate of occurrence, compared to the corresponding unembellished form. Third, among the factors that influence the systematic differences in occurrence rate, no single constraint is dominant, and it is apparent that each constraint adds an additional penalty. For example, compared to a given unvoiced short vowel rhyme (/Vp/, /Vk/ or /Vt/), both the voiced short-vowel counterpart (/Vb/ et al.) and the unvoiced long vowel counterpart (/VVp/ et al.) tend to occur at lower rates. Furthermore, the rhymes that combine a voiced coda with a long vowel (/VVb/ et al.) are even less common than either the voiced short-vowel rhymes or the unvoiced long-vowel rhymes. Within all combinations of coda voicing and vowel length, embellished forms occur at lower rates than their unembellished counterparts. Finally, in many cases, rhymes containing coronal stops occur more frequently than their non-coronal counterparts. The overall pattern indicates that graded penalties against long vowels, voiced-codas, non-coronal places of articulation, and each of the different types of embellishments accumulate, with each violation contributing to the total penalty, thereby reducing the rate of occurrence of any given rhyme type relative to its counterparts violating fewer of these constraints. 1 Not all English vowels contribute to the entries in the table, and forms with a consonant preceded or followed by a fricative other than /s/ have also been excluded. The vowel restrictions and the few other cases involving fricatives are discussed in the fuller analysis presented below. McClelland & Vander Wyk 4 This informal analysis, though far from complete, demonstrates several of the fundamental points that will be explored in the remainder of the paper. There is systematic variation in the occurrence rates of word forms. And moreover, this variation can be fruitfully described using a small set of graded parameters, with no single parameter showing absolute dominance over others. Relation to Other Work We know of little formal theory directed at the explanation of graded patterns of different forms occurrence rates. However, differences in forms’ characteristics have been shown to produce graded effects in a variety of linguistic and non-linguistic tasks such as goodness judgments (Coleman & Pierrehumbert, 1997; Frisch, Broe, & Pierrehumbert, 2004; Frisch, Large, & Pisoni, 2000), nonword repetition (Vitevitch & Luce, 1998), speech errors (Goldrick, 2004), phoneme identification (Pitt & McQueen, 1998), and recognition memory (Frisch et al., 2000). It is true that there is some work in which differences in rates of occurrence have been discussed (Harris, 1994; Kessler & Treiman, 1997). But attempts to develop a formal framework that characterizes which forms can and cannot occur have not fully integrated these graded differences, and the role of the graded accumulation of constraint violations has not been explicitly addressed. For example, (Harris, 1994) often alludes to what he calls preferences, e.g. for coronal relative to non-coronal rhyme types, but does not systematically consider whether the patterns

[1]  Geoffrey E. Hinton,et al.  Schemata and Sequential Thought Processes in PDP Models , 1986 .

[2]  David B Pisoni,et al.  Perception of Wordlikeness: Effects of Segment Probability and Length on the Processing of Nonwords. , 2000, Journal of memory and language.

[3]  Rebecca Treiman Distributional constraints and syllable structure in English , 1988 .

[4]  R. H. Baayen,et al.  The CELEX Lexical Database (CD-ROM) , 1996 .

[5]  M. Studdert-Kennedy,et al.  Self-organizing processes and the explanation of phonological universals , 1983 .

[6]  R. Treiman,et al.  In Defense of an Onset-Rime Syllable Structure for English , 1995, Language and speech.

[7]  P. Luce,et al.  When Words Compete: Levels of Processing in Perception of Spoken Words , 1998 .

[8]  Joan L. Bybee,et al.  Word frequency and context of use in the lexical diffusion of phonetically conditioned sound change , 2002, Language Variation and Change.

[9]  R. Treiman,et al.  Syllable Structure and the Distribution of Phonemes in English Syllables , 1997 .

[10]  Matthew Goldrick,et al.  Phonological features and phonotactic constraints in speech production , 2004 .

[11]  Luigi Burzio Missing players: Phonology and the past-tense debate , 2002 .

[12]  Ulrich H. Frauenfelder,et al.  Neighborhood Density and Frequency Across Languages and Modalities , 1993 .

[13]  P. Boersma,et al.  Empirical Tests of the Gradual Learning Algorithm , 2001, Linguistic Inquiry.

[14]  Paul Smolensky,et al.  Information processing in dynamical systems: foundations of harmony theory , 1986 .

[15]  Rebecca Treiman,et al.  English speakers' sensitivity to phonotactic patterns , 2000 .

[16]  Joan Bybee Joan Bybee: Phonology and Language Use , 2004, Phonetica.

[17]  John Coleman,et al.  Stochastic phonological grammars and acceptability , 1997, SIGMORPHON@EACL.

[18]  K. Gegenfurtner PRAXIS: Brent’s algorithm for function minimization , 1992 .

[19]  Melissa A. Redford,et al.  Constrained Emergence of Universals and Variation in Syllable Systems , 2001, Language and speech.

[20]  John Harris,et al.  English Sound Structure , 1994 .

[21]  Pierre Delattre,et al.  Some Factors of Vowel Duration and Their Cross‐Linguistic Validity , 1962 .

[22]  Janet B. Pierrehumbert,et al.  Similarity Avoidance and the OCP , 2004 .

[23]  T. Landauer,et al.  Structural differences between common and rare words: Failure of equivalence assumptions for theories of word recognition , 1973 .

[24]  P. Smolensky,et al.  Optimality Theory: Constraint Interaction in Generative Grammar , 2004 .

[25]  Gunnar Fant,et al.  Acoustical Analysis of Speech , 2007 .

[26]  Janet B. Pierrehumbert,et al.  Word Games and Syllable Structure , 1995 .

[27]  R. J. Williams,et al.  The logic of activation functions , 1986 .

[28]  Michael Hammond,et al.  The phonology of English : a prosodic optimality-theoretic approach , 1999 .

[29]  M. Pitt,et al.  Is Compensation for Coarticulation Mediated by the Lexicon , 1998 .