Understanding the User Display Names across Social Networks

The display names that an individual uses in various online social networks always contain some redundant information because some people tend to use the similar names across different networks to make them easier to remember or to build their online reputation. In this paper, we aim to measure the redundant information between different display names of the same individual. Based on the crosssite linking function, we first develop a specific distributed crawler to extract the display names that individuals select for different social networks, and we give an overview on the display names we extracted. Then we measure and analyze the redundant information in three ways: length similarity, character similarity and letter distribution similarity, comparing with display names of different individuals. We also analyze the evolution of redundant information over time. We find 45% of users tend to use the same display name across OSNs. Our findings also demonstrate that display names of the same individual show high similarity. The evolution analysis results show that redundant information is time-independent. Awareness of the redundant information between the display names can benefit many applications, such as user identification across social networks.

[1]  William W. Cohen,et al.  A Comparison of String Metrics for Matching Names and Records , 2003 .

[2]  Pan Hui,et al.  Understanding Cross-site Linking in Online Social Networks , 2014, SNAKDD'14.

[3]  Madian Khabsa,et al.  Random Forest DBSCAN for USPTO Inventor Name Disambiguation , 2016, ArXiv.

[4]  Virgílio A. F. Almeida,et al.  Of Pins and Tweets: Investigating How Users Behave Across Image- and Text-Based Social Networks , 2014, ICWSM.

[5]  Reza Zafarani,et al.  Connecting users across social media sites: a behavioral-modeling approach , 2013, KDD.

[6]  Sree Hari Krishnan Parthasarathi,et al.  Exploiting innocuous activity for correlating users across sites , 2013, WWW.

[7]  Usman Qamar,et al.  Identification and Correction of Misspelled Drugs Names in Electronic Medical Records (EMR) , 2016, ICEIS.

[8]  Fan Zhang,et al.  What's in a name?: an unsupervised approach to link users across communities , 2013, WSDM.

[9]  Anupam Joshi,et al.  @i seek 'fb.me': identifying users across multiple online social networks , 2013, WWW.

[10]  George Varghese,et al.  I seek you: searching and matching individuals in social networks , 2009, WIDM.

[11]  Peter Fankhauser,et al.  Identifying Users Across Social Tagging Systems , 2011, ICWSM.

[12]  Francesco Buccafurri,et al.  Discovering Links among Social Networks , 2012, ECML/PKDD.

[13]  Wenbo He,et al.  A Tale of Three Social Networks: User Activity Comparisons across Facebook, Twitter, and Foursquare , 2014, IEEE Internet Computing.

[14]  Vincent Y. Shen,et al.  User identification across multiple social networks , 2009, 2009 First International Conference on Networked Digital Technologies.

[15]  Claude Castelluccia,et al.  How Unique and Traceable Are Usernames? , 2011, PETS.

[16]  Virgílio A. F. Almeida,et al.  Studying User Footprints in Different Online Social Networks , 2012, 2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining.

[17]  Richard Chbeir,et al.  User Profile Matching in Social Networks , 2010, 2010 13th International Conference on Network-Based Information Systems.