Phonaesthemes: A Corpus-Based Analysis

Phonaesthemes: A Corpus-Based Analysis Katya Otis (kotis@northwestern.edu) Department of Psychology, Northwestern University 2029 Sheridan Road, Evanston, IL 60208 USA Eyal Sagi (ermon@northwestern.edu) Department of Psychology, Northwestern University 2029 Sheridan Road, Evanston, IL 60208 USA in separating lexical categories, a construct that necessarily includes some syntactic features and some semantic features (Monaghan et al., 2005). 1 Recent research indicates that systematic sound-meaning and sound-syntax relationships play a role in language processing (Hutchins, 1998; Bergen, 2004; Farmer, Christiansen, & Monaghan, 2006), and may also be important to language learning (Monaghan et al., To the degree that it differs from adult-directed speech, child-directed speech should be sensitive to the child‟s status as a language learner. Monaghan et al. (2005) tested adult speech from the CHILDES corpus for the presence of 16 phonological cues in open- and closed-class words and for their diagnosticity in determining whether a word is a noun or a verb. Significantly diagnostic cues to the noun/verb distinction were: syllable length, onset and syllabic complexity, syllable reduction, -ed inflection (voiced or unvoiced vowel), vowel position, and vowel height. Furthermore, in an experiment on artificial language learning of bigrams, they found that participants used phonological cues when distributional cues were weak or absent. Since grammatical categories can encompass not only syntactically disparate words but also semantically disparate words, this might indicate that sound-meaning correspondences are a boon to language learners, especially in low-frequency cases. Farmer, Christiansen, and Monaghan (2006) expanded the research on phonological diagnostics for lexical category membership begun in Monaghan et al. (2005). They performed a regression analysis on over 3,000 monosyllabic English words that significantly associated certain phonological features with an unambiguous interpretation as either a noun or a verb. An associated series of experiments demonstrated reaction time, reading time, and sentence comprehension advantages for phonologically “noun-like nouns” and “verb-like verbs.” Bergen (2004) used a morphological priming paradigm to test whether there was a processing advantage for words containing phonaesthemes over words that shared only semantic or only formal features, or which contained “pseudo-phonaesthemes.” He found a difference in reaction times between the phonaestheme condition and the other three conditions by comparing primed reaction times Abstract The association between sound and meaning is commonly thought of as symbolic and arbitrary. While this appears to be mostly correct, there is some evidence that specific phonetic groupings can be indicative of word meaning. In this paper we present a corpus-based method that can be used to test whether such an association exists in a given corpus for a specified phonetic grouping. The results we obtain using this method are compared with other empirical findings in the field and its implications are discussed. Keywords: Corpus analysis, Computational linguistics, Phonaesthemes, Phonetics, Psycholinguistics, Sound- Meaning association. It is a popular intuition that words with similar sounds also mean similar things. There is a long tradition of belief in the association between phonetic clusters and semantic clusters going back at least as far as Wallis‟ grammar of English (Wallis, 1699). Morphemes form one such well- known cluster, but other sub-morphemic phonetic clusters that contribute to the meaning of the word as a whole have also been hypothesized (Firth, 1930; Jakobsen & Waugh, 1979). Anthropologists have documented sound symbolism in many languages (Blust, 2003; Nuckolls, 1999; Ramachandran & Hubbard, 2001), but its role as a purely linguistic phenomenon is still unclear. Moreover, the Saussurean notion of the arbitrary relationship between the sign‟s form and its referent is a matter of dogma for most linguists (Hockett, 1960). This makes the study of words that do participate in predictable sound-meaning mappings all the more important, since, under the framework of contemporary linguistics it is difficult to explain how these patterns come to be, or why they might survive despite the obvious benefits of arbitrary sound-meaning mappings. What we mean by “sound-meaning mapping” is not purely sound symbolism, however, nor is it morphology. In the following paper, we offer a statistical, corpus-based approach to the phonaestheme, a sub-morphemic unit that has a predictable effect on the meaning of a word as a whole. These non-morphological relationships between sound and meaning have not been thoroughly explored by behavioral or computational research, with some notable exceptions (e.g. Hutchins, 1998; Bergen, 2004). By contrast, sound-syntax mappings are somewhat better documented in the literature. Monaghan, Chater, and Christiansen (2005) address the role of phonetic similarity For example, Subject-Verb-Object word order implicates syntax; persons, places, and things (nouns) are semantically different from actions and states of being (verbs).

[1]  Peter W. Foltz,et al.  An introduction to latent semantic analysis , 1998 .

[2]  V. Ramachandran,et al.  Synaesthesia? A window into perception, thought and language , 2001 .

[3]  B. Bergen The Psychological Reality of Phonaesthemes , 2004 .

[4]  J. Nuckolls THE CASE FOR SOUND SYMBOLISM , 1999 .

[5]  Roman Jakobson,et al.  The Sound Shape of Language , 1979 .

[6]  C. F. Hockett The origin of speech. , 1960, Scientific American.

[7]  John Wallis,et al.  Grammar of the English language: with an introductory grammatico-physical Treatise on speech, or on the formation of all speech sounds , 1972 .

[8]  Hang Li,et al.  Review of Ambiguity resolution in language learning: computational and cognitive models by Hinrich Schütze. CSLI Publications 1997. , 1999 .

[9]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[10]  Morten H. Christiansen,et al.  Phonological typicality influences on-line sentence comprehension , 2006, Proceedings of the National Academy of Sciences.

[11]  Elsie Fogerty Speech , 1933, Encyclopedia of Evolutionary Psychological Science.

[12]  Hinrich Schütze,et al.  Ambiguity resolution in language learning , 1997 .

[13]  Morten H. Christiansen,et al.  The differential role of phonological and distributional cues in grammatical categorisation , 2005, Cognition.

[14]  Robert Blust,et al.  The Phonestheme ¥- in Austronesian Languages , 2003 .

[15]  Thomas L. Griffiths,et al.  A probabilistic approach to semantic representation , 2019, Proceedings of the Twenty-Fourth Annual Conference of the Cognitive Science Society.

[16]  Susan T. Dumais,et al.  The latent semantic analysis theory of knowledge , 1997 .