Studying User Footprints in Different Online Social Networks

With the growing popularity and usage of online social media services, people now have accounts (some times several) on multiple and diverse services like Facebook, Linked In, Twitter and You Tube. Publicly available information can be used to create a digital footprint of any user using these social media services. Generating such digital footprints can be very useful for personalization, profile management, detecting malicious behavior of users. A very important application of analyzing users' online digital footprints is to protect users from potential privacy and security risks arising from the huge publicly available user information. We extracted information about user identities on different social networks through Social Graph API, Friend Feed, and Profilactic, we collated our own dataset to create the digital footprints of the users. We used username, display name, description, location, profile image, and number of connections to generate the digital footprints of the user. We applied context specific techniques (e.g. Jaro Winkler similarity, Word net based ontologies) to measure the similarity of the user profiles on different social networks. We specifically focused on Twitter and Linked In. In this paper, we present the analysis and results from applying automated classifiers for disambiguating profiles belonging to the same user from different social networks. User ID and Name were found to be the most discriminative features for disambiguating user profiles. Using the most promising set of features and similarity metrics, we achieved accuracy, precision and recall of 98%, 99%, and 96%, respectively.

[1]  Jorma Rissanen,et al.  The Minimum Description Length Principle in Coding and Modeling , 1998, IEEE Trans. Inf. Theory.

[2]  Vincent Y. Shen,et al.  User identification across multiple social networks , 2009, 2009 First International Conference on Networked Digital Technologies.

[3]  Martin Szomszor,et al.  Semantic Modelling of User Interests Based on Cross-Folksonomy Analysis , 2008, SEMWEB.

[4]  Martin Szomszor,et al.  Correlating user profiles from multiple folksonomies , 2008, Hypertext.

[5]  Jennifer Golbeck,et al.  Linking Social Networks on the Web with FOAF: A Semantic Web Case Study , 2008, AAAI.

[6]  Leyla Bilge,et al.  All your contacts are belong to us: automated identity theft attacks on social networks , 2009, WWW '09.

[7]  Christopher Krügel,et al.  Abusing Social Networks for Automated User Profiling , 2010, RAID.

[8]  Claude Castelluccia,et al.  How Unique and Traceable Are Usernames? , 2011, PETS.

[9]  Shyhtsun Felix Wu,et al.  Analysis of user keyword similarity in online social networks , 2011, Social Network Analysis and Mining.

[10]  Reza Zafarani,et al.  Connecting Corresponding Identities across Communities , 2009, ICWSM.

[11]  Matthew Rowe Applying Semantic Social Graphs to Disambiguate Identity References , 2009, ESWC.

[12]  Dmitri V. Kalashnikov,et al.  Domain-independent data cleaning via analysis of entity-relationship graph , 2006, TODS.

[13]  V KalashnikovDmitri,et al.  Domain-independent data cleaning via analysis of entity-relationship graph , 2006 .

[14]  Sotiris Ioannidis,et al.  Detecting social network profile cloning , 2011, 2011 IEEE International Conference on Pervasive Computing and Communications Workshops (PERCOM Workshops).

[15]  Matthew Rowe,et al.  Interlinking Distributed Social Graphs , 2009, LDOW.

[16]  Federica Cena,et al.  User identification for cross-system personalisation , 2009, Inf. Sci..

[17]  Timothy A. Gonsalves,et al.  Feature Selection for Text Classification Based on Gini Coefficient of Inequality , 2010, FSDM.

[18]  Peter Fankhauser,et al.  Identifying Users Across Social Tagging Systems , 2011, ICWSM.

[19]  Ilaria Torre,et al.  User data distributed on the social web: how to identify users on different social systems and collecting data about them , 2010, HetRec '10.

[20]  Calton Pu,et al.  Large Online Social Footprints--An Emerging Threat , 2009, 2009 International Conference on Computational Science and Engineering.

[21]  Usama M. Fayyad,et al.  Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning , 1993, IJCAI.

[22]  Fabio Ciravegna,et al.  Harnessing the Social Web: The Science of Identity Disambiguation , 2010 .

[23]  Larry A. Rendell,et al.  The Feature Selection Problem: Traditional Methods and a New Algorithm , 1992, AAAI.