SmPFT: Social media based profile fusion technique for data enrichment

Abstract People use different social networking platforms for distinct purposes. The information across each micro-blogging site is often partial. A better profile of an individual can be built, if we amalgamate the complementary information from various sites. This data enrich profile can be useful in a number of online services i.e. marketing of any product across sites, friend recommendation, etc. To integrate profile information, it is essential to identify individuals in distinct social networking platforms. This study aims to identify identical users across different social media platforms. Existing works on user profile matching frameworks are restricted to certain social networks as some of the previously available streaming APIs are not available now. In this work, there are no such dependencies over the streaming APIs as it is based on the uniqueness of usernames, which are identical among various social networking sites. We also efficiently exploit the information redundancies, due to individual similar behavioral patterns which can be used during mapping. We have tested our system over 500 users in the real-time scenario, considering only those profiles which generate their content predominately in the English language. The total dataset comprises of over 1.1 Million tweets and 0.63 Million URLs, in which 35.6% URLs contained the geotagged information. Our model is able to identify 6.3% more identical users than the traditional approaches. There are several application areas such as friends recommendation, future place prediction, leaders identification, and information diffusion across social media sites that can benefit from the outcoming of this work.

[1]  George Varghese,et al.  I seek you: searching and matching individuals in social networks , 2009, WIDM.

[2]  Yongjun Li,et al.  Matching user accounts across social networks based on username and display name , 2018, World Wide Web.

[3]  Reza Zafarani,et al.  Connecting Corresponding Identities across Communities , 2009, ICWSM.

[4]  Jon Kleinberg,et al.  Maximizing the spread of influence through a social network , 2003, KDD '03.

[5]  Gene Tsudik,et al.  Exploring Linkability of User Reviews , 2012, ESORICS.

[6]  Yongjun Li,et al.  A deep dive into user display names across social networks , 2018, Inf. Sci..

[7]  Zhen Zhang,et al.  User Identification Based on Display Names Across Online Social Networks , 2017, IEEE Access.

[8]  Durga Toshniwal,et al.  Geospatial sentiment analysis using twitter data for UK-EU referendum , 2018 .

[9]  Vincent Y. Shen,et al.  User identification across multiple social networks , 2009, 2009 First International Conference on Networked Digital Technologies.

[10]  L. Bergroth,et al.  A survey of longest common subsequence algorithms , 2000, Proceedings Seventh International Symposium on String Processing and Information Retrieval. SPIRE 2000.

[11]  Gonzalo Navarro,et al.  A guided tour to approximate string matching , 2001, CSUR.

[12]  Carsten Eickhoff,et al.  A Cross-Platform Collection of Social Network Profiles , 2016, SIGIR.

[13]  Anupam Joshi,et al.  @i seek 'fb.me': identifying users across multiple online social networks , 2013, WWW.

[14]  G. A. Miller THE PSYCHOLOGICAL REVIEW THE MAGICAL NUMBER SEVEN, PLUS OR MINUS TWO: SOME LIMITS ON OUR CAPACITY FOR PROCESSING INFORMATION 1 , 1956 .

[15]  Peter Fankhauser,et al.  Identifying Users Across Social Tagging Systems , 2011, ICWSM.

[16]  Hsinchun Chen,et al.  A framework for authorship identification of online messages: Writing-style features and classification techniques , 2006 .

[17]  Siegfried Handschuh,et al.  An Ontology-Based Technique for Online Profile Resolution , 2013, SocInfo.

[18]  Fan Zhang,et al.  What's in a name?: an unsupervised approach to link users across communities , 2013, WSDM.

[19]  Ankush Mittal,et al.  Construction of a Semi-Automated model for FAQ Retrieval via Short Message Service , 2015, FIRE.

[20]  Reza Zafarani,et al.  Connecting users across social media sites: a behavioral-modeling approach , 2013, KDD.

[21]  Sree Hari Krishnan Parthasarathi,et al.  Exploiting innocuous activity for correlating users across sites , 2013, WWW.

[22]  Claude Castelluccia,et al.  How Unique and Traceable Are Usernames? , 2011, PETS.

[23]  Reza Zafarani,et al.  User Identification Across Social Media , 2015, ACM Trans. Knowl. Discov. Data.

[24]  Philip S. Yu,et al.  Inferring anchor links across multiple heterogeneous social networks , 2013, CIKM.

[25]  Yongjun Li,et al.  Understanding the User Display Names across Social Networks , 2017, WWW.

[26]  Vitaly Shmatikov,et al.  De-anonymizing Social Networks , 2009, 2009 30th IEEE Symposium on Security and Privacy.

[27]  Durga Toshniwal,et al.  Prediction of places of visit using tweets , 2016, Knowledge and Information Systems.