Matching user accounts across social networks based on username and display name

Matching user accounts across social networks is helpful for building better user profile, which has practical significance for many applications. It has attracted many scholars’ attention. Existing works are mainly based on the rich online profiles or activities. However, due to privacy settings or some other specific purposes, the online rich data is usually unavailable, incomplete or unreliable. This makes the existing schemes fail to work properly. Users often make their display names and/or usernames public on different social networks. These names belonging to the same user often contain affluent information redundancies, which provide an opportunity to address the matching problem. In this paper, we focus on the problem of matching user accounts across social networks solely based on username and display name. The problem is two-fold: 1) how to characterize those information redundancies contained in the usernames or display names; 2) how to match the user accounts based on these information redundancies. To address this problem, we propose a solution to User Identification across Social Network based on Username and Display name (UISN-UD), which consists of three key components: 1) extracting features that exploit the information redundancies among names based on user naming habits; 2) training a two-stage classification framework to tackle the user identification problem based on the extracted features; 3) employing the Gale-Shapley algorithm to eliminate the one-to-many or many-to-many relationships existed in the identification results. We perform the experiments based on real social network datasets and the results show that the proposed method can provide excellent performance with F1 values reaching 90%+. From a computational point of view, comparing display names and/or usernames is surely more convenient than comparing the online rich profile attributes or activities of two accounts. This work shows the possibility of matching the user accounts with high accessible and small amount of online data.

[1]  Philip S. Yu,et al.  Multiple Anonymized Social Networks Alignment , 2015, 2015 IEEE International Conference on Data Mining.

[2]  Reza Zafarani,et al.  User Identification Across Social Media , 2015, ACM Trans. Knowl. Discov. Data.

[3]  Philip S. Yu,et al.  Inferring anchor links across multiple heterogeneous social networks , 2013, CIKM.

[4]  Kiran Amin,et al.  Possible Solutions of New User or Item Cold-Start Problem , 2013 .

[5]  Pan Hui,et al.  Understanding Cross-site Linking in Online Social Networks , 2014, SNAKDD'14.

[6]  Roksana Boreli,et al.  Is more always merrier?: a deep dive into online social footprints , 2012, WOSN '12.

[7]  G. A. Miller THE PSYCHOLOGICAL REVIEW THE MAGICAL NUMBER SEVEN, PLUS OR MINUS TWO: SOME LIMITS ON OUR CAPACITY FOR PROCESSING INFORMATION 1 , 1956 .

[8]  Richard Chbeir,et al.  User Profile Matching in Social Networks , 2010, 2010 13th International Conference on Network-Based Information Systems.

[9]  David A. Freedman,et al.  Machiavelli and the Gale-Shapley Algorithm , 1981 .

[10]  Chun Chen,et al.  Mapping Users across Networks by Manifold Alignment on Hypergraph , 2014, AAAI.

[11]  Yongjun Li,et al.  Understanding the User Display Names across Social Networks , 2017, WWW.

[12]  Wenbo He,et al.  A Tale of Three Social Networks: User Activity Comparisons across Facebook, Twitter, and Foursquare , 2014, IEEE Internet Computing.

[13]  Ramayya Krishnan,et al.  HYDRA: large-scale social identity linkage via heterogeneous behavior modeling , 2014, SIGMOD Conference.

[14]  Xiaoping Zhou,et al.  Cross-Platform Identification of Anonymous Identical Users in Multiple Social Media Networks , 2016, IEEE Transactions on Knowledge and Data Engineering.

[15]  Dawn Xiaodong Song,et al.  On the Feasibility of Internet-Scale Author Identification , 2012, 2012 IEEE Symposium on Security and Privacy.

[16]  Shazia Wasim Sadiq,et al.  Discovering interpretable geo-social communities for user behavior prediction , 2016, 2016 IEEE 32nd International Conference on Data Engineering (ICDE).

[17]  Vitaly Shmatikov,et al.  De-anonymizing Social Networks , 2009, 2009 30th IEEE Symposium on Security and Privacy.

[18]  Bartunov Sergey,et al.  Joint Link-Attribute User Identity Resolution in Online Social Networks , 2012 .

[19]  Zhiyuan Liu,et al.  PRISM: Profession Identification in Social Media with Personal Information and Community Structure , 2015, SMP.

[20]  Vincent Yun Shen,et al.  User Identification across Social Networks using the Web Profile and Friend Network , 2010, Int. J. Web Appl..

[21]  Jing Xiao,et al.  User Identity Linkage by Latent User Space Modelling , 2016, KDD.

[22]  Silvio Lattanzi,et al.  An efficient reconciliation algorithm for social networks , 2013, Proc. VLDB Endow..

[23]  Oana Goga,et al.  Matching user accounts across online social networks : methods and applications. (Corrélation des profils d'utilisateurs dans les réseaux sociaux : méthodes et applications) , 2014 .

[24]  Zhen Zhang,et al.  User Identification Based on Display Names Across Online Social Networks , 2017, IEEE Access.

[25]  Virgílio A. F. Almeida,et al.  Of Pins and Tweets: Investigating How Users Behave Across Image- and Text-Based Social Networks , 2014, ICWSM.

[26]  Wei Chen,et al.  Exploiting Spatio-Temporal User Behaviors for User Linkage , 2017, CIKM.

[27]  Reza Zafarani,et al.  Connecting users across social media sites: a behavioral-modeling approach , 2013, KDD.

[28]  Philip S. Yu,et al.  Integrated Anchor and Social Link Predictions across Social Networks , 2015, IJCAI.

[29]  Reza Zafarani,et al.  Connecting Corresponding Identities across Communities , 2009, ICWSM.

[30]  Philip S. Yu,et al.  Meta-path based multi-network collective link prediction , 2014, KDD.

[31]  George Varghese,et al.  I seek you: searching and matching individuals in social networks , 2009, WIDM.

[32]  Yongjun Li,et al.  Matching user accounts based on user generated content across social networks , 2018, Future Gener. Comput. Syst..

[33]  Fan Zhang,et al.  What's in a name?: an unsupervised approach to link users across communities , 2013, WSDM.

[34]  D. Ruths,et al.  Social media for large studies of behavior , 2014, Science.

[35]  Claude Castelluccia,et al.  How Unique and Traceable Are Usernames? , 2011, PETS.

[36]  Yang Wang,et al.  SPTF: A Scalable Probabilistic Tensor Factorization Model for Semantic-Aware Behavior Prediction , 2017, 2017 IEEE International Conference on Data Mining (ICDM).