Connecting users across social media sites: a behavioral-modeling approach

People use various social media for different purposes. The information on an individual site is often incomplete. When sources of complementary information are integrated, a better profile of a user can be built to improve online services such as verifying online information. To integrate these sources of information, it is necessary to identify individuals across social media sites. This paper aims to address the cross-media user identification problem. We introduce a methodology (MOBIUS) for finding a mapping among identities of individuals across social media sites. It consists of three key components: the first component identifies users' unique behavioral patterns that lead to information redundancies across sites; the second component constructs features that exploit information redundancies due to these behavioral patterns; and the third component employs machine learning for effective user identification. We formally define the cross-media user identification problem and show that MOBIUS is effective in identifying users across social media sites. This study paves the way for analysis and mining across social media sites, and facilitates the creation of novel online services across sites.

[1]  C. A. Ferguson Word Stress in Persian , 1957 .

[2]  D. Cowan An introduction to modern literary Arabic , 1958 .

[3]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[4]  Jianhua Lin,et al.  Divergence measures based on the Shannon entropy , 1991, IEEE Trans. Inf. Theory.

[5]  Ted E. Dunning,et al.  Statistical Identification of Language , 1994 .

[6]  F ChenStanley,et al.  An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[7]  Alan F. Blackwell,et al.  The memorability and security of passwords – some empirical results , 2000 .

[8]  George M. Mohay,et al.  Mining e-mail content for author identification forensics , 2001, SGMD.

[9]  Paul M. B. Vitányi,et al.  Clustering by compression , 2003, IEEE Transactions on Information Theory.

[10]  Rong Zheng,et al.  A framework for authorship identification of online messages: Writing-style features and classification techniques , 2006, J. Assoc. Inf. Sci. Technol..

[11]  Cynthia Dwork,et al.  Wherefore art thou r3579x?: anonymized social networks, hidden patterns, and structural steganography , 2007, WWW '07.

[12]  Elad Yom-Tov,et al.  Serial Sharers: Detecting Split Identities of Web Authors , 2007, PAN.

[13]  Vitaly Shmatikov,et al.  Robust De-anonymization of Large Sparse Datasets , 2008, 2008 IEEE Symposium on Security and Privacy (sp 2008).

[14]  Leyla Bilge,et al.  All your contacts are belong to us: automated identity theft attacks on social networks , 2009, WWW '09.

[15]  Reza Zafarani,et al.  Connecting Corresponding Identities across Communities , 2009, ICWSM.

[16]  Reza Zafarani,et al.  Understanding User Migration Patterns in Social Media , 2011, AAAI.

[17]  Claude Castelluccia,et al.  How Unique and Traceable Are Usernames? , 2011, PETS.