On the universal structure of human lexical semantics

Significance Semantics, or meaning expressed through language, provides indirect access to an underlying level of conceptual structure. To what degree this conceptual structure is universal or is due to properties of cultural histories, or to the environment inhabited by a speech community, is still controversial. Meaning is notoriously difficult to measure, let alone parameterize, for quantitative comparative studies. Using cross-linguistic dictionaries across languages carefully selected as an unbiased sample reflecting the diversity of human languages, we provide an empirical measure of semantic relatedness between concepts. Our analysis uncovers a universal structure underlying the sampled vocabulary across language groups independent of their phylogenetic relations, their speakers’ culture, and geographic environment. How universal is human conceptual structure? The way concepts are organized in the human brain may reflect distinct features of cultural, historical, and environmental background in addition to properties universal to human cognition. Semantics, or meaning expressed through language, provides indirect access to the underlying conceptual structure, but meaning is notoriously difficult to measure, let alone parameterize. Here, we provide an empirical measure of semantic proximity between concepts using cross-linguistic dictionaries to translate words to and from languages carefully selected to be representative of worldwide diversity. These translations reveal cases where a particular language uses a single “polysemous” word to express multiple concepts that another language represents using distinct words. We use the frequency of such polysemies linking two concepts as a measure of their semantic proximity and represent the pattern of these linkages by a weighted network. This network is highly structured: Certain concepts are far more prone to polysemy than others, and naturally interpretable clusters of closely related concepts emerge. Statistical analysis of the polysemies observed in a subset of the basic vocabulary shows that these structural properties are consistent across different language groups, and largely independent of geography, environment, and the presence or absence of a literary tradition. The methods developed here can be applied to any semantic domain to reveal the extent to which its conceptual structure is, similarly, a universal attribute of human cognition and language use.

[1]  A. Wierzbicka Semantics: Primes and Universals , 1996 .

[2]  Marisa Lohr,et al.  Methods for the genetic classification of languages. , 1999 .

[3]  Edith A. Moravcsik,et al.  Universals of Human Language, I: Method and Theory , 1980 .

[4]  Å. Viberg The verbs of perception: a typological study , 1983 .

[5]  Comrie Bernard Language Universals and Linguistic Typology , 1982 .

[6]  Maria Koptjevskaja-Tamm New directions in lexical typology , 2012 .

[7]  W. Bruce Croft,et al.  Relativity, linguistic variation and language universals , 2010 .

[8]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[9]  M. Bowerman,et al.  Learning to express motion events in English and Korean: The influence of language-specific lexicalization patterns , 1991, Cognition.

[10]  M. Swadesh Lexico-Statistical Dating of Prehistoric Ethnic Contacts , 1952 .

[11]  Mario A. Pei,et al.  Language, Thought and Reality: Selected Writings of Benjamin Lee Whorf , 1957 .

[12]  Joseph H. Greenberg,et al.  Language Universals: With Special Reference to Feature Hierarchies , 1966 .

[13]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[14]  W. Bruce Croft Typology and Universals , 1990 .

[15]  A. Dobson Comparing the shapes of trees , 1975 .

[16]  H. H. Hock Principles of historical linguistics , 1986 .

[17]  Anne-Béatrice Dufour,et al.  The ade4 Package: Implementing the Duality Diagram for Ecologists , 2007 .

[18]  T. Shopen,et al.  Language typology and syntactic description , 2013 .

[19]  S. Engel Thought and Language , 1964 .

[20]  Timothy Shopen Language Typology and Syntactic Description: List of figures , 2007 .

[21]  C. B. Colby The weirdest people in the world , 1973 .

[22]  S. Levinson,et al.  WEIRD languages have misled us, too , 2010, Behavioral and Brain Sciences.

[23]  Simon J. Greenhill,et al.  Evolved structure of language shows lineage-specific trends in word-order universals , 2011, Nature.

[24]  Matthew S. Dryer,et al.  Large Linguistic Areas and Language Sampling , 1989 .

[25]  Cecil H. Brown,et al.  figurative language In a universalist perspective , 1981 .

[26]  Kees Hengeveld,et al.  A method of language sampling , 1993 .

[27]  J. Fodor,et al.  The Language of Thought , 1980 .

[28]  J. Lucy,et al.  Grammatical categories and cognition: References , 1992 .

[29]  Simon J. Greenhill,et al.  Mapping the Origins and Expansion of the Indo-European Language Family , 2012, Science.

[30]  Prabhakar Raghavan,et al.  The electrical resistance of a graph captures its commute and cover times , 2005, computational complexity.

[31]  S. Levinson,et al.  The myth of language universals: language diversity and its importance for cognitive science. , 2009, The Behavioral and brain sciences.

[32]  Brett Kessler,et al.  Book Reviews: The Significance of Word Lists , 2001, CL.

[33]  C. Pollard,et al.  Center for the Study of Language and Information , 2022 .

[34]  S. Levinson Space in language and cognition: Explorations in cognitive diversity , 2003 .

[35]  Robin Fox,et al.  Kinship and Marriage: An Anthropological Perspective , 1968 .

[36]  Marián Sloboda Typology and Universals (review) , 2005 .

[37]  N. Mantel The detection of disease clustering and a generalized regression approach. , 1967, Cancer research.

[38]  Douglas E. Critchlow,et al.  THE TRIPLES DISTANCE FOR ROOTED BIFURCATING PHYLOGENETIC TREES , 1996 .

[39]  A. Majid,et al.  The cross-linguistic categorization of everyday events: A study of cutting and breaking , 2008, Cognition.

[40]  Cecil H. Brown general principles of human anatomical partonomy and speculations on the growth of partonomic nomenclature1 , 1976 .

[41]  William N. Venables,et al.  Modern Applied Statistics with S , 2010 .

[42]  J. Greenberg The Logical Analysis of Kinship , 1949, Philosophy of Science.

[43]  C. H. Brown Where Do Cardinal Direction Terms Come From , 1983 .

[44]  Mark Durie,et al.  The comparative method reviewed : regularity and irregularity in language change , 1997 .

[45]  Nicholas Evans Multiple semiotic systems, hyperpolysemy, and the reconstruction of semantic change in Australian languages , 1992 .

[46]  Cecil H. Brown A Theory of Lexical Change (with Examples from Folk Biology, Human Anatomical Partonomy and Other Domains). , 1979 .

[47]  Gábor Csárdi,et al.  The igraph software package for complex network research , 2006 .

[48]  Huaiyu Zhu On Information and Sufficiency , 1997 .

[49]  D. Bakker,et al.  Language sampling , 2019, The SAGE Encyclopedia of Human Communication Sciences and Disorders.

[50]  Anthony Fox,et al.  Linguistic Reconstruction: An Introduction to Theory and Method , 1995 .

[51]  D. Robinson,et al.  Comparison of phylogenetic trees , 1981 .