Identifying Users across Different Sites using Usernames

Identifying users across different sites is to find the accounts that belong to the same individual. The problem is fundamental and important, and its results can benefit many applications such as social recommendation. Observing that 1) usernames are essential elements for all sites; 2) most users have limited number of usernames on the Internet; 3) usernames carries information that reflect an individual's characteristics and habits etc., this paper tries to identify users based on username similarity. Specifically, we introduce the self-information vector model to integrate our proposed content and pattern features extracted from usernames into vectors. In this paper, we define two usernames similarity as the cosine similarity between their self-information vectors. We further propose an abbreviation detection method to discover the initialism phenomenon in usernames, which can improve our user identification results. Experimental results on real-world username sets show that we can achieve 86.19% precision rate, 68.53% recall rate and 76.21% F1-measure in average, which is better than the state-of-the-art work.

[1]  Changsheng Xu,et al.  Unified YouTube Video Recommendation via Cross-network Collaboration , 2015, ICMR.

[2]  Claude Castelluccia,et al.  How Unique and Traceable Are Usernames? , 2011, PETS.

[3]  Reza Zafarani,et al.  Connecting Corresponding Identities across Communities , 2009, ICWSM.

[4]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[5]  Ramayya Krishnan,et al.  HYDRA: large-scale social identity linkage via heterogeneous behavior modeling , 2014, SIGMOD Conference.

[6]  Reza Zafarani,et al.  User Identification Across Social Media , 2015, ACM Trans. Knowl. Discov. Data.

[7]  Jian Pei,et al.  Finding email correspondents in online social networks , 2013, World Wide Web.

[8]  Vitaly Shmatikov,et al.  De-anonymizing Social Networks , 2009, 2009 30th IEEE Symposium on Security and Privacy.

[9]  Reza Zafarani,et al.  Connecting users across social media sites: a behavioral-modeling approach , 2013, KDD.

[10]  Chun Chen,et al.  Mapping Users across Networks by Manifold Alignment on Hypergraph , 2014, AAAI.

[11]  Fan Zhang,et al.  What's in a name?: an unsupervised approach to link users across communities , 2013, WSDM.

[12]  Vitaly Shmatikov,et al.  Robust De-anonymization of Large Sparse Datasets , 2008, 2008 IEEE Symposium on Security and Privacy (sp 2008).

[13]  Jon M. Kleinberg,et al.  Wherefore art thou R3579X? , 2011, Commun. ACM.

[14]  Wenyuan Xu,et al.  A Large-Scale Empirical Analysis of Chinese Web Passwords , 2014, USENIX Security Symposium.

[15]  Philip S. Yu,et al.  COSNET: Connecting Heterogeneous Social Networks with Local and Global Consistency , 2015, KDD.