Structural invariants and semantic fingerprints in the “ego network” of words

Well-established cognitive models coming from anthropology have shown that, due to the cognitive constraints that limit our “bandwidth” for social interactions, humans organize their social relations according to a regular structure. In this work, we postulate that similar regularities can be found in other cognitive processes, such as those involving language production. In order to investigate this claim, we analyse a dataset containing tweets of a heterogeneous group of Twitter users (regular users and professional writers). Leveraging a methodology similar to the one used to uncover the well-established social cognitive constraints, we find regularities at both the structural and semantic levels. In the former, we find that a concentric layered structure (which we call ego network of words, in analogy to the ego network of social relationships) very well captures how individuals organise the words they use. The size of the layers in this structure regularly grows (approximately 2-3 times with respect to the previous one) when moving outwards, and the two penultimate external layers consistently account for approximately 60% and 30% of the used words, irrespective of the number of layers of the user. For the semantic analysis, each ring of each ego network is described by a semantic profile, which captures the topics associated with the words in the ring. We find that ring #1 has a special role in the model. It is semantically the most dissimilar and the most diverse among the rings. We also show that the topics that are important in the innermost ring also have the characteristic of being predominant in each of the other rings, as well as in the entire ego network. In this respect, ring #1 can be seen as the semantic fingerprint of the ego network of words.

[1]  Fethi A. Inan,et al.  Understanding topic duration in Twitter learning communities using data mining , 2021, J. Comput. Assist. Learn..

[2]  Nada Lavrac,et al.  Link Analysis meets Ontologies: Are Embeddings the Answer? , 2021, ArXiv.

[3]  M. Conti,et al.  Harnessing the Power of Ego Network Layers for Link Prediction in Online Social Networks , 2021, IEEE Transactions on Computational Social Systems.

[4]  Chiara Boldrini,et al.  Structural Invariants in Individuals Language Use: The "Ego Network" of Words , 2020, SocInfo.

[5]  Leland McInnes,et al.  UMAP: Uniform Manifold Approximation and Projection , 2018, J. Open Source Softw..

[6]  Kit Yan Chan,et al.  Twitter mining for ontology-based domain discovery incorporating machine learning , 2018, J. Knowl. Manag..

[7]  Marco Conti,et al.  Twitter and the Press: an Ego-Centred Analysis , 2018, WWW.

[8]  Filippo Menczer,et al.  Feature Engineering for Social Bot Detection , 2018 .

[9]  Leland McInnes,et al.  UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction , 2018, ArXiv.

[10]  M. Brysbaert,et al.  The Word Frequency Effect in Word Processing: An Updated Review , 2018 .

[11]  Richard Sosis,et al.  Optimising human community sizes , 2018, Evolution and human behavior : official journal of the Human Behavior and Evolution Society.

[12]  Aykut Koç,et al.  Semantic Structure and Interpretability of Word Embeddings , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[13]  Marco Conti,et al.  Online Social Networks and Media , 2017 .

[14]  Leland McInnes,et al.  Accelerated Hierarchical Density Based Clustering , 2017, 2017 IEEE International Conference on Data Mining Workshops (ICDMW).

[15]  Markus F. Damian,et al.  Tracking the time course of lexical access in orthographic production: An event-related potential study of word frequency effects in written picture naming , 2016, Brain and Language.

[16]  Marc Brysbaert,et al.  How Many Words Do We Know? Practical Estimates of Vocabulary Size Dependent on Word Definition, the Degree of Language Input and the Participant’s Age , 2016, Front. Psychol..

[17]  Eiji Aramaki,et al.  Vocabulary Size in Speech May Be an Early Indicator of Cognitive Impairment , 2016, PLoS ONE.

[18]  Filippo Menczer,et al.  BotOrNot: A System to Evaluate Social Bots , 2016, WWW.

[19]  Marco Conti,et al.  The structure of online social networks mirrors those in the offline world , 2015, Soc. Networks.

[20]  Sanja Fidler,et al.  Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[21]  Krishna P. Gummadi,et al.  Inferring user interests in the Twitter social network , 2014, RecSys '14.

[22]  Cécile Favre,et al.  Mention-anomaly-based Event Detection and tracking in Twitter , 2014, 2014 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2014).

[23]  S. Piantadosi Zipf’s word frequency law in natural language: A critical review and future directions , 2014, Psychonomic Bulletin & Review.

[24]  Susan Gauch,et al.  Personalized News Recommendation Using Twitter , 2013, 2013 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT).

[25]  Esteban Moro Egido,et al.  Time as a limited resource: Communication Strategy in Mobile Phone Networks , 2013, Soc. Networks.

[26]  Joachim Mathiesen,et al.  Communication dynamics in finite capacity social networks , 2012, Physical review letters.

[27]  Jens F. Binder,et al.  Relationships and the social brain: integrating psychological and evolutionary perspectives. , 2012, British journal of psychology.

[28]  Alessandro Vespignani,et al.  Modeling Users' Activity on Twitter Networks: Validation of Dunbar's Number , 2011, PloS one.

[29]  Qi Gao,et al.  Analyzing user modeling on twitter for personalized news recommendations , 2011, UMAP'11.

[30]  Alexandros Nanopoulos,et al.  Hubs in Space: Popular Nearest Neighbors in High-Dimensional Data , 2010, J. Mach. Learn. Res..

[31]  Michele T. Diaz,et al.  A comparison of brain activity evoked by single content and function words: An fMRI investigation of implicit word processing , 2009, Brain Research.

[32]  Flavius Frasincar,et al.  A Semantic Web-Based Approach for Building Personalized News Services , 2009, Int. J. E Bus. Res..

[33]  E. Loper,et al.  NLTK: The Natural Language Toolkit , 2006, ACL 2006.

[34]  Edward W. Wlotko,et al.  Word learning and individual differences in word learning reflected in event-related potentials. , 2005, Journal of experimental psychology. Learning, memory, and cognition.

[35]  Didier Sornette,et al.  Discrete hierarchical organization of social group sizes , 2004, Proceedings of the Royal Society B: Biological Sciences.

[36]  I. Vajda,et al.  A new class of metric divergences on probability spaces and its applicability in statistics , 2003 .

[37]  Steven Bird,et al.  NLTK: The Natural Language Toolkit , 2002, ACL.

[38]  D Y von Cramon,et al.  Segregating semantic and syntactic aspects of processing in the human brain: an fMRI investigation of different word types. , 2000, Cerebral cortex.

[39]  M. Studdert-Kennedy,et al.  Approaches to the Evolution of Language , 1999 .

[40]  Willem J. M. Levelt,et al.  A theory of lexical access in speech production , 1999, Behavioral and Brain Sciences.

[41]  John R. Anderson,et al.  Reflections of the Environment in Memory Form of the Memory Functions , 2022 .

[42]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[43]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[44]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[45]  Robin I. M. Dunbar,et al.  Social network size in humans , 2003, Human nature.

[46]  Robin I. M. Dunbar Social Brain Hypothesis , 1998, Encyclopedia of Evolutionary Psychological Science.

[47]  Jianhua Lin,et al.  Divergence measures based on the Shannon entropy , 1991, IEEE Trans. Inf. Theory.

[48]  Arthur C. Graesser,et al.  Limited Processing Capacity Constrains the Storage of Unrelated Sets of Words and Retrieval from Natural Categories. , 1978 .

[49]  Larry D. Hostetler,et al.  The estimation of the gradient of a density function, with applications in pattern recognition , 1975, IEEE Trans. Inf. Theory.

[50]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[51]  G. Jenks The Data Model Concept in Statistical Mapping , 1967 .

[52]  Broadbent De Word-frequency effect and response bias. , 1967 .

[53]  George Kingsley Zipf,et al.  Human behavior and the principle of least effort , 1949 .