User Identification Across Social Media

People use various social media sites for different purposes. The information on each site is often partial. When sources of complementary information are integrated, a better profile of a user can be built. This profile can help improve online services such as advertising across sites. To integrate these sources of information, it is necessary to identify individuals across social media sites. This paper aims to address the cross-media user identification problem. We provide evidence on the existence of a mapping among identities of individuals across social media sites, study the feasibility of finding this mapping, and illustrate and develop means for finding this mapping. Our studies show that effective approaches that exploit information redundancies due to users’ unique behavioral patterns can be utilized to find such a mapping. This study paves the way for analysis and mining across social networking sites, and facilitates the creation of novel online services across sites. In particular, recommending friends and advertising across networks, analyzing information diffusion across sites, and studying specific user behavior such as user migration across sites in social media are one of the many areas that can benefit from the results of this study.

[1]  Jasmine Novak,et al.  Anti-aliasing on the web , 2004, WWW '04.

[2]  Alan F. Blackwell,et al.  The memorability and security of passwords – some empirical results , 2000 .

[3]  Huan Liu,et al.  Community detection via heterogeneous interaction analysis , 2012, Data Mining and Knowledge Discovery.

[4]  Lakshminarayanan Subramanian,et al.  Sybil-Resilient Online Content Voting , 2009, NSDI.

[5]  Xiaojin Zhu,et al.  --1 CONTENTS , 2006 .

[6]  Jon M. Kleinberg,et al.  Feedback effects between similarity and social influence in online communities , 2008, KDD.

[7]  Gueorgi Kossinets,et al.  Empirical Analysis of an Evolving Social Network , 2006, Science.

[8]  Elad Yom-Tov,et al.  Serial Sharers: Detecting Split Identities of Web Authors , 2007, PAN.

[9]  Yehuda Lindell,et al.  Privacy Preserving Data Mining , 2002, Journal of Cryptology.

[10]  Philip S. Yu,et al.  Transferring heterogeneous links across location-based social networks , 2014, WSDM.

[11]  Ted E. Dunning,et al.  Statistical Identification of Language , 1994 .

[12]  Irit Dinur,et al.  Revealing information while preserving privacy , 2003, PODS.

[13]  Nizar Habash,et al.  On Arabic Transliteration , 2007 .

[14]  Reza Zafarani,et al.  Understanding User Migration Patterns in Social Media , 2011, AAAI.

[15]  Krishna P. Gummadi,et al.  Ostra: Leveraging Trust to Thwart Unwanted Communication , 2008, NSDI.

[16]  Jure Leskovec,et al.  Microscopic evolution of social networks , 2008, KDD.

[17]  Reza Zafarani,et al.  Social Media Mining: An Introduction , 2014 .

[18]  George M. Mohay,et al.  Mining e-mail content for author identification forensics , 2001, SGMD.

[19]  Jie Tang,et al.  Inferring social ties across heterogenous networks , 2012, WSDM '12.

[20]  G. A. Miller THE PSYCHOLOGICAL REVIEW THE MAGICAL NUMBER SEVEN, PLUS OR MINUS TWO: SOME LIMITS ON OUR CAPACITY FOR PROCESSING INFORMATION 1 , 1956 .

[21]  Peter Fankhauser,et al.  Identifying Users Across Social Tagging Systems , 2011, ICWSM.

[22]  Jianhua Lin,et al.  Divergence measures based on the Shannon entropy , 1991, IEEE Trans. Inf. Theory.

[23]  Fan Zhang,et al.  What's in a name?: an unsupervised approach to link users across communities , 2013, WSDM.

[24]  Stan Matwin,et al.  Feature Engineering for Text Classification , 1999, ICML.

[25]  Meinard Müller,et al.  Information retrieval for music and motion , 2007 .

[26]  Hector Garcia-Molina,et al.  Combating Web Spam with TrustRank , 2004, VLDB.

[27]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[28]  Marc Najork,et al.  Web Crawling , 2010, Found. Trends Inf. Retr..

[29]  Chris Arney,et al.  Networks, Crowds, and Markets: Reasoning about a Highly Connected World (Easley, D. and Kleinberg, J.; 2010) [Book Review] , 2013, IEEE Technology and Society Magazine.

[30]  Rong Zheng,et al.  A framework for authorship identification of online messages: Writing-style features and classification techniques , 2006, J. Assoc. Inf. Sci. Technol..

[31]  Reza Zafarani,et al.  Connecting Corresponding Identities across Communities , 2009, ICWSM.

[32]  Charu C. Aggarwal,et al.  On the design and quantification of privacy preserving data mining algorithms , 2001, PODS.

[33]  Jon M. Kleinberg,et al.  Group formation in large social networks: membership, growth, and evolution , 2006, KDD '06.

[34]  Hsinchun Chen,et al.  Applying authorship analysis to extremist-group Web forum messages , 2005, IEEE Intelligent Systems.

[35]  Michael Kaminsky,et al.  SybilGuard: Defending Against Sybil Attacks via Social Networks , 2008, IEEE/ACM Transactions on Networking.

[36]  Reza Zafarani,et al.  Users Joining Multiple Sites: Distributions and Patterns , 2014, ICWSM.

[37]  Olfa Nasraoui,et al.  Web data mining: exploring hyperlinks, contents, and usage data , 2008, SKDD.

[38]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[39]  D. Cowan An introduction to modern literary Arabic , 1958 .

[40]  Reza Zafarani,et al.  Sarcasm Detection on Twitter: A Behavioral Modeling Approach , 2015, WSDM.

[41]  Reza Zafarani,et al.  Connecting users across social media sites: a behavioral-modeling approach , 2013, KDD.

[42]  Gordon V. Cormack,et al.  Feature engineering for mobile (SMS) spam filtering , 2007, SIGIR.

[43]  Ramakrishnan Srikant,et al.  Mining newsgroups using networks arising from social behavior , 2003, WWW '03.

[44]  C. A. Ferguson Word Stress in Persian , 1957 .

[45]  Cynthia Dwork,et al.  Wherefore art thou r3579x?: anonymized social networks, hidden patterns, and structural steganography , 2007, WWW '07.

[46]  Leyla Bilge,et al.  All your contacts are belong to us: automated identity theft attacks on social networks , 2009, WWW '09.

[47]  Claude Castelluccia,et al.  How Unique and Traceable Are Usernames? , 2011, PETS.

[48]  Eamonn J. Keogh,et al.  Towards parameter-free data mining , 2004, KDD.

[49]  F ChenStanley,et al.  An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[50]  Paul M. B. Vitányi,et al.  Clustering by compression , 2003, IEEE Transactions on Information Theory.

[51]  Jiawei Han,et al.  ACM Transactions on Knowledge Discovery from Data: Introduction , 2007 .

[52]  Alexandre V. Evfimievski,et al.  Limiting privacy breaches in privacy preserving data mining , 2003, PODS.

[53]  Jon M. Kleinberg,et al.  The link-prediction problem for social networks , 2007, J. Assoc. Inf. Sci. Technol..

[54]  Philip S. Yu,et al.  A General Survey of Privacy-Preserving Data Mining Models and Algorithms , 2008, Privacy-Preserving Data Mining.

[55]  Vitaly Shmatikov,et al.  Robust De-anonymization of Large Sparse Datasets , 2008, 2008 IEEE Symposium on Security and Privacy (sp 2008).

[56]  Huan Liu,et al.  Identifying Evolving Groups in Dynamic Multimode Networks , 2012, IEEE Transactions on Knowledge and Data Engineering.