Wordform Similarity Increases With Semantic Similarity: An Analysis of 100 Languages

Although the mapping between form and meaning is often regarded as arbitrary, there are in fact well-known constraints on words which are the result of functional pressures associated with language use and its acquisition. In particular, languages have been shown to encode meaning distinctions in their sound properties, which may be important for language learning. Here, we investigate the relationship between semantic distance and phonological distance in the large-scale structure of the lexicon. We show evidence in 100 languages from a diverse array of language families that more semantically similar word pairs are also more phonologically similar. This suggests that there is an important statistical trend for lexicons to have semantically similar words be phonologically similar as well, possibly for functional reasons associated with language learning.

[1]  Abdellah Fourtassi,et al.  A corpus-based evaluation method for Distributional Semantic Models , 2013, ACL.

[2]  Kong-On Kim,et al.  Sound symbolism in Korean , 1977, Journal of Linguistics.

[3]  Scott A. Jackson,et al.  High functional load inhibits phonological contrast loss: A corpus study , 2013, Cognition.

[4]  A. Christophe,et al.  Learning novel phonological neighbors: Syntactic category matters , 2015, Cognition.

[5]  R. Paget The Origin of Speech , 1927, Nature.

[6]  Keith Johnson,et al.  Why reduce? Phonological neighborhood density and phonetic reduction in spontaneous speech , 2012 .

[7]  Julie C. Sedivy,et al.  Subject Terms: Linguistics Language Eyes & eyesight Cognition & reasoning , 1995 .

[8]  Zellig S. Harris,et al.  Distributional Structure , 1954 .

[9]  Jeremy Goslin,et al.  PhonItalia: a phonological lexicon for Italian , 2014, Behavior research methods.

[10]  C. Habel,et al.  Language , 1931, NeuroImage.

[11]  Peter Graff,et al.  Communicative Efficiency in the Lexicon , 2014 .

[12]  Edward Gibson,et al.  Word Forms Are Structured for Efficient Use , 2018, Cogn. Sci..

[13]  Dekang Lin,et al.  An Information-Theoretic Definition of Similarity , 1998, ICML.

[14]  Morten H. Christiansen,et al.  Sound–meaning association biases evidenced across thousands of languages , 2016, Proceedings of the National Academy of Sciences.

[15]  Marc Brysbaert,et al.  Lexique 2 : A new French lexical database , 2004, Behavior research methods, instruments, & computers : a journal of the Psychonomic Society, Inc.

[16]  Hans Peter Luhn,et al.  The Automatic Creation of Literature Abstracts , 1958, IBM J. Res. Dev..

[17]  B. Bergen The Psychological Reality of Phonaesthemes , 2004 .

[18]  B. Lindblom Phonetic Universals in Vowel Systems , 1986 .

[19]  T. Landauer,et al.  A Solution to Plato's Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge. , 1997 .

[20]  Scott McDonald,et al.  Filled pauses and their status in the mental lexicon , 2001, DiSS.

[21]  Petr Sojka,et al.  Software Framework for Topic Modelling with Large Corpora , 2010 .

[22]  Shoko Hamano The sound-symbolic system of Japanese , 2000 .

[23]  Joseph Paul Stemberger,et al.  Neighbourhood effects on error rates in speech production , 2004, Brain and Language.

[24]  Willem H. Zuidema Optimal Communication in a Noisy and Heterogeneous Environment , 2003, ECAL.

[25]  Charles Kemp,et al.  Kinship Categories Across Languages Reflect General Communicative Principles , 2012, Science.

[26]  Monica Tamariz,et al.  Exploring systematicity between phonological and context-cooccurrence representations of the mental lexicon , 2008 .

[27]  Morten H. Christiansen,et al.  Arbitrariness, Iconicity, and Systematicity in Language , 2015, Trends in Cognitive Sciences.

[28]  H. Storkel,et al.  Differentiating phonotactic probability and neighborhood density in adult word learning. , 2006, Journal of speech, language, and hearing research : JSLHR.

[29]  Claude E. Shannon,et al.  The mathematical theory of communication , 1950 .

[30]  Dan Jurafsky,et al.  Effects of disfluencies, predictability, and utterance position on word form variation in English conversation. , 2003, The Journal of the Acoustical Society of America.

[31]  Morten H. Christiansen,et al.  From sound to syntax: phonological constraints on children's lexical categorization of new words* , 2008, Journal of Child Language.

[32]  M. H. Kelly,et al.  Using sound to solve syntactic problems: the role of phonology in grammatical category assignments. , 1992, Psychological review.

[33]  Michael S Vitevitch,et al.  The facilitative influence of phonological similarity and neighborhood frequency in speech production in younger and older adults , 2003, Memory & cognition.

[34]  Austin F. Frank,et al.  Speaking Rationally: Uniform Information Density as an Optimal Strategy for Language Production , 2008 .

[35]  C. Spence,et al.  “Bouba” and “Kiki” in Namibia? A remote culture make similar shape–sound matches, but different shape–taste matches to Westerners , 2013, Cognition.

[36]  Sarah C. Creel,et al.  Heeding the voice of experience: The role of talker variation in lexical access , 2008, Cognition.

[37]  S. Kirby,et al.  Cultural selection for learnability: Three principles underlying the view that language adapts to be learnable , 2006 .

[38]  Philippe Schlenker,et al.  Event representations constrain the structure of language: Sign language as a window into universally accessible linguistic biases , 2015, Proceedings of the National Academy of Sciences.

[39]  Alice Turk,et al.  The Smooth Signal Redundancy Hypothesis: A Functional Explanation for Relationships between Redundancy, Prosodic Prominence, and Duration in Spontaneous Speech , 2004, Language and speech.

[40]  Marc Brysbaert,et al.  Subtlex-pl: subtitle-based word frequency estimates for Polish , 2014, Behavior Research Methods.

[41]  Roger Levy,et al.  Speakers optimize information density through syntactic reduction , 2006, NIPS.

[42]  H. Storkel,et al.  The independent effects of phonotactic probability and neighbourhood density on lexical acquisition by preschool children , 2011, Language and cognitive processes.

[43]  Steven Pinker,et al.  Language learnability and language development , 1985 .

[44]  Morten H. Christiansen,et al.  The arbitrariness of the sign: learning advantages from the structure of the vocabulary. , 2011, Journal of experimental psychology. General.

[45]  Sarah C. Creel Preschoolers' use of talker information in on-line comprehension. , 2012, Child development.

[46]  D. Maurer,et al.  Synesthesia: a new approach to understanding the development of perception. , 2013, Developmental psychology.

[47]  G. Miller,et al.  Cognitive science. , 1981, Science.

[48]  Steven T Piantadosi,et al.  Word lengths are optimized for efficient communication , 2011, Proceedings of the National Academy of Sciences.

[49]  Ferdinand de Saussure Course in General Linguistics , 1916 .

[50]  Edward Flemming Contrast and Perceptual Distinctiveness , 2003 .

[51]  Morten H. Christiansen,et al.  Integration of multiple probabilistic cues in syntax acquisition , 2008 .

[52]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[53]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[54]  Benjamin Bergen,et al.  Finding Non-Arbitrary Form-Meaning Systematicity Using String-Metric Learning for Kernel Regression , 2016, ACL.

[55]  George Kingsley Zipf,et al.  Human behavior and the principle of least effort , 1949 .

[56]  G. Altmann,et al.  Incremental interpretation at verbs: restricting the domain of subsequent reference , 1999, Cognition.

[57]  Simon Kirby,et al.  Cumulative cultural evolution in the laboratory: An experimental approach to the origins of structure in human language , 2008, Proceedings of the National Academy of Sciences.

[58]  Simon Kirby,et al.  How arbitrary is language? , 2014, Philosophical Transactions of the Royal Society B: Biological Sciences.

[59]  S. Kita,et al.  The sound symbolism bootstrapping hypothesis for language acquisition and language evolution , 2014, Philosophical Transactions of the Royal Society B: Biological Sciences.

[60]  M. H. Kelly,et al.  Phonological information for grammatical category assignments , 1991 .

[61]  N. Chater,et al.  Simplicity: a unifying principle in cognitive science? , 2003, Trends in Cognitive Sciences.