Words cluster phonetically beyond phonotactic regularities

Recent evidence suggests that cognitive pressures associated with language acquisition and use could affect the organization of the lexicon. On one hand, consistent with noisy channel models of language (e.g., Levy, 2008), the phonological distance between wordforms should be maximized to avoid perceptual confusability (a pressure for dispersion). On the other hand, a lexicon with high phonological regularity would be simpler to learn, remember and produce (e.g., Monaghan et al., 2011) (a pressure for clumpiness). Here we investigate wordform similarity in the lexicon, using measures of word distance (e.g., phonological neighborhood density) to ask whether there is evidence for dispersion or clumpiness of wordforms in the lexicon. We develop a novel method to compare lexicons to phonotactically-controlled baselines that provide a null hypothesis for how clumpy or sparse wordforms would be as the result of only phonotactics. Results for four languages, Dutch, English, German and French, show that the space of monomorphemic wordforms is clumpier than what would be expected by the best chance model according to a wide variety of measures: minimal pairs, average Levenshtein distance and several network properties. This suggests a fundamental drive for regularity in the lexicon that conflicts with the pressure for words to be as phonologically distinct as possible.

[1]  M. H. Kelly,et al.  Using sound to solve syntactic problems: the role of phonology in grammatical category assignments. , 1992, Psychological review.

[2]  Mirjam Ernestus,et al.  Syntactic predictability in the recognition of carefully and casually produced speech. , 2015, Journal of experimental psychology. Learning, memory, and cognition.

[3]  Anne Christophe,et al.  Function Words Constrain On-Line Recognition of Verbs and Nouns in French 18-Month-Olds , 2014 .

[4]  Jennifer E. Arnold,et al.  Reference production: Production-internal and addressee-oriented processes , 2008 .

[5]  Michael S Vitevitch,et al.  The influence of phonological similarity neighborhoods on speech production. , 2002, Journal of experimental psychology. Learning, memory, and cognition.

[6]  Andrew J Aschenbrenner,et al.  The effect of homonymy on learning correctly articulated versus misarticulated words. , 2013, Journal of speech, language, and hearing research : JSLHR.

[7]  P. Jusczyk,et al.  Infants' sensitivity to phonotactic patterns in the native language. , 1994 .

[8]  Alan Nielsen,,et al.  The source and magnitude of sound-symbolic biases in processing artificial word material and their implications for language learning and transmission , 2012, Language and Cognition.

[9]  Jill R. Hoover,et al.  An online calculator to compute phonotactic probability and neighborhood density on the basis of child corpora of spoken American English , 2010, Behavior research methods.

[10]  H. Storkel,et al.  Differentiating phonotactic probability and neighborhood density in adult word learning. , 2006, Journal of speech, language, and hearing research : JSLHR.

[11]  Leon Bergen,et al.  Rational integration of noisy evidence and prior semantic expectations in sentence interpretation , 2013, Proceedings of the National Academy of Sciences.

[12]  Morten H. Christiansen,et al.  The arbitrariness of the sign: learning advantages from the structure of the vocabulary. , 2011, Journal of experimental psychology. General.

[13]  D. Steriade Directional asymmetries in place assimilation: a perceptual account , 2001 .

[14]  R. Aslin,et al.  Lexical competition in young children’s word learning , 2007, Cognitive Psychology.

[15]  Mirjam Ernestus,et al.  Distinctive phonological features differ in relevance for both spoken and written word recognition , 2004, Brain and Language.

[16]  William D. Raymond,et al.  Word-internal /t,d/ deletion in spontaneous speech: Modeling the effects of extra-linguistic, lexical, and phonological factors , 2006, Language Variation and Change.

[17]  S. Kita,et al.  Sound symbolism facilitates early verb learning , 2008, Cognition.

[18]  Thomas L. Griffiths,et al.  Bayesian Inference for PCFGs via Markov Chain Monte Carlo , 2007, NAACL.

[19]  B. Bergen The Psychological Reality of Phonaesthemes , 2004 .

[20]  D. Howes,et al.  Zipf's Law and Miller's Random-Monkey Model , 1968 .

[21]  Stanley Wasserman,et al.  Social Network Analysis: Methods and Applications , 1994, Structural analysis in the social sciences.

[22]  Adam Albright,et al.  Feature-based generalisation as a source of gradient acceptability* , 2009, Phonology.

[23]  Simon Kirby,et al.  How arbitrary is language? , 2014, Philosophical Transactions of the Royal Society B: Biological Sciences.

[24]  S. Kita,et al.  The sound symbolism bootstrapping hypothesis for language acquisition and language evolution , 2014, Philosophical Transactions of the Royal Society B: Biological Sciences.

[25]  Marc Brysbaert,et al.  Lexique 2 : A new French lexical database , 2004, Behavior research methods, instruments, & computers : a journal of the Psychonomic Society, Inc.

[26]  Sharon Peperkamp,et al.  Asymmetries in the exploitation of phonetic features for word recognition. , 2015, The Journal of the Acoustical Society of America.

[27]  Simon Kirby,et al.  Cumulative cultural evolution in the laboratory: An experimental approach to the origins of structure in human language , 2008, Proceedings of the National Academy of Sciences.

[28]  Monica Tamariz,et al.  Exploring systematicity between phonological and context-cooccurrence representations of the mental lexicon , 2008 .

[29]  Edward Gibson,et al.  Word Forms Are Structured for Efficient Use , 2018, Cogn. Sci..

[30]  Marcello Barbieri,et al.  On the Origin of Language , 2010, Biosemiotics.

[31]  Robert Schreuder,et al.  Prosodic cues for morphological complexity in Dutch and English , 2005 .

[32]  A. Christophe,et al.  Learning novel phonological neighbors: Syntactic category matters , 2015, Cognition.

[33]  V. Ferreira,et al.  Don't Talk About Pink Elephants! , 2006, Psychological science.

[34]  E. Sapir A study in phonetic symbolism. , 1929 .

[35]  Lahomtoires d'Electronique AN INFORMATIONAL THEORY OF THE STATISTICAL STRUCTURE OF LANGUAGE 36 , 2010 .

[36]  C. Habel,et al.  Language , 1931, NeuroImage.

[37]  Peter Graff,et al.  Communicative Efficiency in the Lexicon , 2014 .

[38]  D. Steriade Phonetics in Phonology: The Case of Laryngeal Neutralization , 1999 .

[39]  Mark E. J. Newman,et al.  The Structure and Function of Complex Networks , 2003, SIAM Rev..

[40]  Gary S. Dell,et al.  Neighbors in the lexicon: Friends or foes? , 2003 .

[41]  Julia Strand,et al.  Grammatical context constrains lexical competition in spoken word recognition , 2014, Memory & cognition.

[42]  Uriel Cohen Priva Using Information Content to PredictPhone Deletion , 2008 .

[43]  Paul A. Luce,et al.  Neighborhoods of Words in the Mental Lexicon. Research on Speech Perception. Technical Report No. 6. , 1986 .

[44]  Bruce Hayes,et al.  A Maximum Entropy Model of Phonotactics and Phonotactic Learning , 2008, Linguistic Inquiry.

[45]  Steven T. Piantadosi,et al.  The communicative function of ambiguity in language , 2011, Cognition.

[46]  C. F. Hockett The origin of speech. , 1960, Scientific American.

[47]  B. Lindblom,et al.  Numerical Simulation of Vowel Quality Systems: The Role of Perceptual Contrast , 1972 .

[48]  G. Altmann,et al.  Incremental interpretation at verbs: restricting the domain of subsequent reference , 1999, Cognition.

[49]  S. Gahl Time and Thyme Are not Homophones: The Effect of Lemma Frequency on Word Durations in Spontaneous Speech , 2008 .

[50]  Jason Riggle,et al.  Information theoretic approaches to phonological structure: the case of Finnish vowel harmony , 2012 .

[51]  R. Harald Baayen,et al.  A Stochastic Process for Word Frequency Distributions , 1991, ACL.

[52]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[53]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[54]  Alice Turk,et al.  The Smooth Signal Redundancy Hypothesis: A Functional Explanation for Relationships between Redundancy, Prosodic Prominence, and Duration in Spontaneous Speech , 2004, Language and speech.

[55]  Jan P. H. van Santen,et al.  Duration and spectral balance of intervocalic consonants: A case for efficient communication , 2005, Speech Commun..

[56]  P. Luce,et al.  When Words Compete: Levels of Processing in Perception of Spoken Words , 1998 .

[57]  Sharon Peperkamp,et al.  (Non)words, (non)words, (non)words: evidence for a protolexicon during the first year of life. , 2013, Developmental science.

[58]  Susanne Gahl,et al.  Lexical competition in vowel articulation revisited: Vowel dispersion in the Easy/Hard database , 2015, J. Phonetics.

[59]  Noam Chomsky,et al.  Some controversial questions in phonological theory , 1965, Journal of Linguistics.

[60]  R. Harald Baayen,et al.  Word Frequency Distributions , 2001 .

[61]  Julia F. Strand,et al.  Many neighborhoods: Phonological and perceptual neighborhood density in lexical production and perception , 2016 .

[62]  P. Luce Neighborhoods of words in the mental lexicon , 1986 .

[63]  S. Kirby,et al.  Compression and communication in the cultural evolution of linguistic structure , 2015, Cognition.

[64]  Scott A. Jackson,et al.  Functional Load and the Lexicon: Evidence that Syntactic Category and Frequency Relationships in Minimal Lemma Pairs Predict the Loss of Phoneme contrasts in Language Change , 2013, Language and speech.

[65]  Daniel H. Younger,et al.  Recognition and Parsing of Context-Free Languages in Time n^3 , 1967, Inf. Control..

[66]  Michael S. Vitevitch,et al.  The Structure of Phonological Networks across Multiple Languages , 2009, Int. J. Bifurc. Chaos.

[67]  Edward Flemming,et al.  Auditory Representations in Phonology , 2002 .

[68]  Scott A. Jackson,et al.  High functional load inhibits phonological contrast loss: A corpus study , 2013, Cognition.

[69]  Mirjam Ernestus,et al.  Articulatory Planning Is Continuous and Sensitive to Informational Redundancy , 2005, Phonetica.

[70]  S. Kirby,et al.  The emergence of linguistic structure: an overview of the iterated learning model , 2002 .

[71]  S. Roodenrys,et al.  Complex network structure influences processing in long-term and short-term memory. , 2012, Journal of memory and language.

[72]  Gary S. Dell,et al.  Stages in sentence production: An analysis of speech error data , 1981 .

[73]  Holly L Storkel,et al.  Developmental differences in the effects of phonological, lexical and semantic variables on word learning by infants* , 2008, Journal of Child Language.

[74]  Joseph Paul Stemberger,et al.  Neighbourhood effects on error rates in speech production , 2004, Brain and Language.

[75]  H. Simon,et al.  ON A CLASS OF SKEW DISTRIBUTION FUNCTIONS , 1955 .

[76]  Alex Thornton,et al.  What is cumulative cultural evolution? , 2018, Proceedings of the Royal Society B: Biological Sciences.

[77]  Ramon Ferrer-i-Cancho,et al.  Information content versus word length in random typing , 2011, ArXiv.

[78]  Adamantios I. Gafos,et al.  The Articulatory Basis of Locality in Phonology , 1999 .

[79]  André Martinet,et al.  Function, Structure, and Sound Change , 1952 .

[80]  Matthew A Goldrick,et al.  A restricted interaction account (RIA) of spoken word production: The best of both worlds , 2002 .

[81]  M KANZER,et al.  The communicative function of the dream. , 1955, The International journal of psycho-analysis.

[82]  G. A. Miller,et al.  An Analysis of Perceptual Confusions Among Some English Consonants , 1955 .

[83]  Richard N Aslin,et al.  Young children's sensitivity to probabilistic phonotactics in the developing lexicon. , 2004, Journal of experimental child psychology.

[84]  Edward Gibson,et al.  Wordform Similarity Increases With Semantic Similarity: An Analysis of 100 Languages , 2016, Cogn. Sci..

[85]  M. Aylett,et al.  Language redundancy predicts syllabic duration and the spectral characteristics of vocalic syllable nuclei. , 2006, The Journal of the Acoustical Society of America.

[86]  Morten H. Christiansen,et al.  From sound to syntax: phonological constraints on children's lexical categorization of new words* , 2008, Journal of Child Language.

[87]  G. Miller,et al.  Some effects of intermittent silence. , 1957, The American journal of psychology.

[88]  C. F. Hockett The Quantification of Functional Load , 1967 .

[89]  G. Dell,et al.  A Case-Series Test of the Interactive Two-Step Model of Lexical Access: Evidence from Picture Naming. , 2006 .

[90]  Keith Johnson,et al.  Why reduce? Phonological neighborhood density and phonetic reduction in spontaneous speech , 2012 .

[91]  J. Ohala Sound Symbolism , 2004, Encyclopedia of Slavic Languages and Linguistics Online.

[92]  D. Pisoni,et al.  Phonotactics, Neighborhood Activation, and Lexical Access for Spoken Words , 1999, Brain and Language.

[93]  James S. Magnuson,et al.  The Dynamics of Lexical Competition During Spoken Word Recognition , 2007, Cogn. Sci..

[94]  Ulrike Hahn,et al.  Phoneme similarity and confusability , 2005 .

[95]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[96]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[97]  Simon Kirby,et al.  Iterated Learning: A Framework for the Emergence of Language , 2003, Artificial Life.

[98]  M. Vitevitch What can graph theory tell us about word learning and lexical retrieval? , 2008, Journal of speech, language, and hearing research : JSLHR.

[99]  M. Ruhlen The Origin of Language , 1994 .

[100]  D. Pisoni,et al.  Recognizing Spoken Words: The Neighborhood Activation Model , 1998, Ear and hearing.

[101]  Stanley Wasserman,et al.  Social Network Analysis: Methods and Applications , 1994 .

[102]  Claude E. Shannon,et al.  The mathematical theory of communication , 1950 .

[103]  Dan Jurafsky,et al.  Effects of disfluencies, predictability, and utterance position on word form variation in English conversation. , 2003, The Journal of the Acoustical Society of America.

[104]  Nivedita Mani,et al.  Word-form familiarity bootstraps infant speech segmentation. , 2013, Developmental science.

[105]  Clara D. Martin,et al.  Reconciling Phonological Neighborhood Effects in Speech Production through Single Trial Analysis Reconciling Phonological Neighborhood Effects in Speech Production through Single Trial Analysis , 2022 .

[106]  L. Nygaard,et al.  Sound to meaning correspondences facilitate word learning , 2009, Cognition.

[107]  Michael S Vitevitch,et al.  The facilitative influence of phonological similarity and neighborhood frequency in speech production in younger and older adults , 2003, Memory & cognition.

[108]  Holly L. Storkel,et al.  Do children acquire dense neighborhoods? An investigation of similarity neighborhoods in lexical acquisition , 2004, Applied Psycholinguistics.

[109]  J. Elman,et al.  Knowing a lot for one's age: Vocabulary skill and not age is associated with anticipatory incremental sentence interpretation in children and adults. , 2012, Journal of experimental child psychology.

[110]  R. Levy Expectation-based syntactic comprehension , 2008, Cognition.

[111]  R. Cole,et al.  Perceptibility of phonetic features in fluent speech. , 1978, The Journal of the Acoustical Society of America.

[112]  Edward Gibson,et al.  Information content versus word length in natural language: A reply to Ferrer-i-Cancho and Moscoso del Prado Martin [arXiv:1209.1751] , 2013, 1307.6726.

[113]  Holly L Storkel,et al.  A comparison of homonym and novel word learning: the role of phonotactic probability and word frequency , 2005, Journal of Child Language.

[114]  Ferdinand de Saussure Course in General Linguistics , 1916 .

[115]  Edward Flemming Contrast and Perceptual Distinctiveness , 2003 .