Aligning Entity Names with Online Aliases on Twitter

This paper presents new models that automatically align online aliases with their real entity names. Many research applications rely on identifying entity names in text, but people often refer to entities with unexpected nicknames and aliases. For example, The King and King James are aliases for Lebron James, a professional basketball player. Recent work on entity linking attempts to resolve mentions to knowledge base entries, like a wikipedia page, but linking is unfortunately limited to well-known entities with pre-built pages. This paper asks a more basic question: can aliases be aligned without background knowledge of the entity? Further, can the semantics surrounding alias mentions be used to inform alignments? We describe statistical models that make decisions based on the lexicographic properties of the aliases with their semantic context in a large corpus of tweets. We experiment on a database of Twitter users and their usernames, and present the first human evaluation for this task. Alignment accuracy approaches human performance at 81%, and we show that while lexicographic features are most important, the semantic context of an alias further improves classification accuracy.

[1]  Katja Filippova,et al.  User Demographics and Language in an Implicit Social Network , 2012, EMNLP.

[2]  Mari Ostendorf,et al.  What Your Username Says About You , 2015, EMNLP.

[3]  J. R. Firth,et al.  A Synopsis of Linguistic Theory, 1930-1955 , 1957 .

[4]  Derek Ruths,et al.  Gender Inference of Twitter Users in Non-English Contexts , 2013, EMNLP.

[5]  Dong Nguyen,et al.  "How Old Do You Think I Am?" A Study of Language and Age in Twitter , 2013, ICWSM.

[6]  John D. Burger,et al.  Discriminating Gender on Twitter , 2011, EMNLP.

[7]  Lluís F. Hurtado,et al.  Political Tendency Identification in Twitter using Sentiment Analysis Techniques , 2014, COLING.

[8]  Benjamin Van Durme Streaming Analysis of Discourse Participants , 2012, EMNLP-CoNLL.

[9]  Vitaly Shmatikov,et al.  De-anonymizing Social Networks , 2009, 2009 30th IEEE Symposium on Security and Privacy.

[10]  Fan Zhang,et al.  What's in a name?: an unsupervised approach to link users across communities , 2013, WSDM.

[11]  Li Guo,et al.  Identifying Users across Different Sites using Usernames , 2016, ICCS.

[12]  David Yarowsky,et al.  Improving Gender Prediction of Social Media Users via Weighted Annotator Rationales , 2014 .

[13]  Li Liu,et al.  Aligning Users across Social Networks Using Network Embedding , 2016, IJCAI.

[14]  Pu-Jen Cheng,et al.  Person Identification between Different Online Social Networks , 2014, 2014 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT).

[15]  Oana Goga,et al.  Matching user accounts across online social networks : methods and applications. (Corrélation des profils d'utilisateurs dans les réseaux sociaux : méthodes et applications) , 2014 .

[16]  Svitlana Volkova,et al.  Inferring Latent User Properties from Texts Published in Social Media , 2015, AAAI.

[17]  Svitlana Volkova,et al.  Inferring User Political Preferences from Streaming Communications , 2014, ACL.

[18]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[19]  Derek L. Hansen,et al.  Computing political preference among twitter followers , 2011, CHI.

[20]  Roberto Navigli,et al.  SemEval-2015 Task 13: Multilingual All-Words Sense Disambiguation and Entity Linking , 2015, *SEMEVAL.

[21]  Derek Ruths,et al.  Classifying Political Orientation on Twitter: It's Not Easy! , 2013, ICWSM.

[22]  Razvan C. Bunescu,et al.  Using Encyclopedic Knowledge for Named entity Disambiguation , 2006, EACL.

[23]  Roberto Navigli,et al.  Entity Linking meets Word Sense Disambiguation: a Unified Approach , 2014, TACL.

[24]  Fusheng Wang,et al.  A Comparative Study of Demographic Attribute Inference in Twitter , 2015, ICWSM.

[25]  Doug Downey,et al.  Local and Global Algorithms for Disambiguation to Wikipedia , 2011, ACL.

[26]  Jacob Ratkiewicz,et al.  Predicting the Political Alignment of Twitter Users , 2011, 2011 IEEE Third Int'l Conference on Privacy, Security, Risk and Trust and 2011 IEEE Third Int'l Conference on Social Computing.

[27]  Yan-Ying Chen,et al.  Visually Interpreting Names as Demographic Attributes by Exploiting Click-Through Data , 2015, AAAI.

[28]  Timothy Cribbin,et al.  An Interactive Method for Inferring Demographic Attributes in Twitter , 2015, HT.

[29]  David Yarowsky,et al.  Classifying latent user attributes in twitter , 2010, SMUC '10.

[30]  Reza Zafarani,et al.  User Identification Across Social Media , 2015, ACM Trans. Knowl. Discov. Data.

[31]  Jun Zhao,et al.  Collective entity linking in web text: a graph-based method , 2011, SIGIR.

[32]  M. de Rijke,et al.  Discovering missing links in Wikipedia , 2005, LinkKDD '05.

[33]  Gianluca Demartini,et al.  ZenCrowd: leveraging probabilistic reasoning and crowdsourcing techniques for large-scale entity linking , 2012, WWW.

[34]  Ramayya Krishnan,et al.  HYDRA: large-scale social identity linkage via heterogeneous behavior modeling , 2014, SIGMOD Conference.

[35]  Chun Chen,et al.  Mapping Users across Networks by Manifold Alignment on Hypergraph , 2014, AAAI.

[36]  D. Ruths,et al.  What's in a Name? Using First Names as Features for Gender Inference in Twitter , 2013, AAAI Spring Symposium: Analyzing Microtext.

[37]  Jon M. Kleinberg,et al.  Wherefore art thou R3579X? , 2011, Commun. ACM.

[38]  David Yarowsky,et al.  Broadly Improving User Classification via Communication-Based Name and Location Clustering on Twitter , 2013, NAACL.

[39]  Mung Chiang,et al.  Quantifying Political Leaning from Tweets and Retweets , 2013, ICWSM.

[40]  Mark Dredze,et al.  Entity Disambiguation for Knowledge Base Population , 2010, COLING.

[41]  Ian H. Witten,et al.  Learning to link with wikipedia , 2008, CIKM '08.

[42]  Martin Vetterli,et al.  Where You Are Is Who You Are: User Identification by Matching Statistics , 2015, IEEE Transactions on Information Forensics and Security.

[43]  Shou-De Lin,et al.  Matching users and items across domains to improve the recommendation quality , 2014, KDD.

[44]  Rada Mihalcea,et al.  Wikify!: linking documents to encyclopedic knowledge , 2007, CIKM '07.

[45]  Krishna P. Gummadi,et al.  On the Reliability of Profile Matching Across Large Online Social Networks , 2015, KDD.

[46]  Siddharth Srivastava,et al.  Anonymizing Social Networks , 2007 .