SocialVec: Social Entity Embeddings

This paper introduces SocialVec, a general framework for eliciting social world knowledge from social networks, and applies this framework to Twitter. SocialVec learns lowdimensional embeddings of popular accounts, which represent entities of general interest, based on their co-occurrences patterns within the accounts followed by individual users, thus modeling entity similarity in socio-demographic terms. Similar to word embeddings, which facilitate tasks that involve text processing, we expect social entity embeddings to benefit tasks of social flavor. We have learned social embeddings for roughly 200,000 popular accounts from a sample of the Twitter network that includes more than 1.3 million users and the accounts that they follow, and evaluate the resulting embeddings on two different tasks. The first task involves the automatic inference of personal traits of users from their social media profiles. In another study, we exploit SocialVec embeddings for gauging the political bias of news sources in Twitter. In both cases, we prove SocialVec embeddings to be advantageous compared with existing entity embedding schemes. We will make the SocialVec entity embeddings publicly available to support further exploration of social world knowledge as reflected in Twitter.

[1]  Gerhard Weikum,et al.  KORE: keyphrase overlap relatedness for entity disambiguation , 2012, CIKM.

[2]  Alice E. Marwick,et al.  To See and Be Seen: Celebrity Practice on Twitter , 2011 .

[3]  Mark Dredze,et al.  Predicting Twitter User Demographics from Names Alone , 2018, PEOPLES@NAACL-HTL.

[4]  M. Kosinski,et al.  Computer-based personality judgments are more accurate than those made by humans , 2015, Proceedings of the National Academy of Sciences.

[5]  Hiroyuki Shindo,et al.  Joint Learning of the Embedding of Words and Entities for Named Entity Disambiguation , 2016, CoNLL.

[6]  Steven Skiena,et al.  DeepWalk: online learning of social representations , 2014, KDD.

[7]  Guoyin Wang,et al.  Baseline Needs More Love: On Simple Word-Embedding-Based Models and Associated Pooling Mechanisms , 2018, ACL.

[8]  Krishna P. Gummadi,et al.  Media Bias Monitor: Quantifying Biases of Social Media News Outlets at Large-Scale , 2018, ICWSM.

[9]  Tomas Mikolov,et al.  Bag of Tricks for Efficient Text Classification , 2016, EACL.

[10]  Alexander Peysakhovich,et al.  PyTorch-BigGraph: A Large-scale Graph Embedding System , 2019, SysML.

[11]  Tsvi Kuflik,et al.  Assessing the Contribution of Twitter's Textual Information to Graph-based Recommendation , 2017, IUI.

[12]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[13]  Chandler May,et al.  Social Bias in Elicited Natural Language Inferences , 2017, EthNLP@EACL.

[14]  Thomas Pellissier Tanon,et al.  From Freebase to Wikidata: The Great Migration , 2016, WWW.

[15]  Xinlei Chen,et al.  Never-Ending Learning , 2012, ECAI.

[16]  Jason Weston,et al.  Translating Embeddings for Modeling Multi-relational Data , 2013, NIPS.

[17]  Petr Sojka,et al.  Software Framework for Topic Modelling with Large Corpora , 2010 .

[18]  Jure Leskovec,et al.  node2vec: Scalable Feature Learning for Networks , 2016, KDD.

[19]  Yonatan Belinkov,et al.  Fine-grained Analysis of Sentence Embeddings Using Auxiliary Prediction Tasks , 2016, ICLR.

[20]  Mark Dredze,et al.  Learning Multiview Embeddings of Twitter Users , 2016, ACL.

[21]  Preslav Nakov,et al.  Predicting the Topical Stance and Political Leaning of Media using Tweets , 2020, ACL.

[22]  M. Gentzkow,et al.  Social Media and Fake News in the 2016 Election , 2017 .

[23]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[24]  Lyle H. Ungar,et al.  Exploring Stylistic Variation with Age and Income on Twitter , 2016, ACL.

[25]  Hiroyuki Shindo,et al.  Wikipedia2Vec: An Efficient Toolkit for Learning and Visualizing the Embeddings of Words and Entities from Wikipedia , 2020, EMNLP.

[26]  Brendan T. O'Connor,et al.  From Tweets to Polls: Linking Text Sentiment to Public Opinion Time Series , 2010, ICWSM.

[27]  Svitlana Volkova,et al.  Inferring Latent User Properties from Texts Published in Social Media , 2015, AAAI.

[28]  Svitlana Volkova,et al.  Inferring Perceived Demographics from User Emotional Tone and User-Environment Emotional Contrast , 2016, ACL.

[29]  Wei Lu,et al.  Twitter Homophily: Network Based Prediction of User’s Occupation , 2019, ACL.

[30]  Omer Levy,et al.  Neural Word Embedding as Implicit Matrix Factorization , 2014, NIPS.

[31]  J. Crowcroft,et al.  Visualizing Media Bias through Twitter , 2012, Proceedings of the International AAAI Conference on Web and Social Media.

[32]  Cornelia Caragea,et al.  CancerEmo: A Dataset for Fine-Grained Emotion Detection , 2020, EMNLP.

[33]  M. Fiorina,et al.  Political Polarization in the American Public , 2008 .

[34]  Felix Hill,et al.  SimLex-999: Evaluating Semantic Models With (Genuine) Similarity Estimation , 2014, CL.

[35]  Alicia L. Nobles,et al.  Demographic Representation and Collective Storytelling in the Me Too Twitter Hashtag Activism Movement , 2020, Proc. ACM Hum. Comput. Interact..

[36]  Oren Barkan,et al.  ITEM2VEC: Neural item embedding for collaborative filtering , 2016, 2016 IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP).

[37]  Preslav Nakov,et al.  What Was Written vs. Who Read It: News Media Profiling Using Text Analysis and Social Media Context , 2020, ACL.

[38]  Joshua A. Tucker,et al.  How Many People Live in Political Bubbles on Social Media? Evidence From Linked Survey and Twitter Data , 2019, SAGE Open.