A deep dive into user display names across social networks

The display names from an individual across Online Social Networks (OSNs) always contain abundant information redundancies because most users tend to use one main name or similar names across OSNs to make them easier to remember or to build their online reputation. These information redundancies are of great benefit to information fusion across OSNs. In this paper, we aim to measure these information redundancies between different display names of the same individual. Based on the cross-site linking function of Foursquare, we first develop a distributed crawler to extract the display names that individuals used in Facebook, Twitter and Foursquare, respectively. We construct three display name datasets across three OSNs, and measure the information redundancies in three ways: length similarity, character similarity and letter distribution similarity. We also analyze the evolution of redundant information over time. Finally, we apply the measurement results to the user identification across OSNs. We find that (1) more than 45% of users tend to use the same display name across OSNs; (2) the display names of the same individual for different OSNs show high similarity; (3) the information redundancies of display names are time-independent; (4) the AUC values of user identification results only based on display names are more than 0.9 on three datasets.

[1]  Virgílio A. F. Almeida,et al.  Of Pins and Tweets: Investigating How Users Behave Across Image- and Text-Based Social Networks , 2014, ICWSM.

[2]  Reza Zafarani,et al.  Connecting users across social media sites: a behavioral-modeling approach , 2013, KDD.

[3]  Sree Hari Krishnan Parthasarathi,et al.  Exploiting innocuous activity for correlating users across sites , 2013, WWW.

[4]  Yongjun Li,et al.  Matching user accounts based on user generated content across social networks , 2018, Future Gener. Comput. Syst..

[5]  Anupam Joshi,et al.  @i seek 'fb.me': identifying users across multiple online social networks , 2013, WWW.

[6]  Claude Castelluccia,et al.  How Unique and Traceable Are Usernames? , 2011, PETS.

[7]  Thomas G. Szymanski,et al.  A fast algorithm for computing longest common subsequences , 1977, CACM.

[8]  Gonzalo Navarro,et al.  A guided tour to approximate string matching , 2001, CSUR.

[9]  Zhen Zhang,et al.  User Identification Based on Display Names Across Online Social Networks , 2017, IEEE Access.

[10]  Usman Qamar,et al.  Identification and Correction of Misspelled Drugs Names in Electronic Medical Records (EMR) , 2016, ICEIS.

[11]  Virgílio A. F. Almeida,et al.  Studying User Footprints in Different Online Social Networks , 2012, 2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining.

[12]  Wenbo He,et al.  A Tale of Three Social Networks: User Activity Comparisons across Facebook, Twitter, and Foursquare , 2014, IEEE Internet Computing.

[13]  Francesco Buccafurri,et al.  Discovering Links among Social Networks , 2012, ECML/PKDD.

[14]  Richard Chbeir,et al.  User Profile Matching in Social Networks , 2010, 2010 13th International Conference on Network-Based Information Systems.

[15]  Madian Khabsa,et al.  Random Forest DBSCAN for USPTO Inventor Name Disambiguation , 2016, ArXiv.

[16]  Fan Zhang,et al.  What's in a name?: an unsupervised approach to link users across communities , 2013, WSDM.

[17]  Yongjun Li,et al.  Understanding the User Display Names across Social Networks , 2017, WWW.

[18]  George Varghese,et al.  I seek you: searching and matching individuals in social networks , 2009, WIDM.

[19]  Flemming Topsøe,et al.  Jensen-Shannon divergence and Hilbert space embedding , 2004, International Symposium onInformation Theory, 2004. ISIT 2004. Proceedings..

[20]  Peter Fankhauser,et al.  Identifying Users Across Social Tagging Systems , 2011, ICWSM.

[21]  Ayumi Shinohara,et al.  Efficient algorithms to compute compressed longest common substrings and compressed palindromes , 2009, Theor. Comput. Sci..

[22]  William W. Cohen,et al.  A Comparison of String Metrics for Matching Names and Records , 2003 .

[23]  Pan Hui,et al.  Understanding Cross-site Linking in Online Social Networks , 2014, SNAKDD'14.

[24]  Vincent Y. Shen,et al.  User identification across multiple social networks , 2009, 2009 First International Conference on Networked Digital Technologies.