ASNets: A Benchmark Dataset of Aligned Social Networks for Cross-Platform User Modeling

Aligning heterogeneous online social networks is a highly beneficial task proposed in recent years. It targets at automatically aligning accounts from multiple networks by whether they are held by the same natural person. Aligning the networks can improve personalized services by cross-platform user modeling, and is the prerequisite for cross-network analysis. However, there is currently no public benchmark dataset available due to its recency. As performances of this task depend highly on the dataset, experiments using different private datasets are not directly comparable. Therefore, in this paper we propose ASNets, a benchmark dataset with two sets of aligned social networks. With this dataset, we can now properly evaluate different approaches and compare them fairly. The two sets of aligned networks have 328,224 and 141,614 aligned users respectively, covering multilingual usage (Chinese and English) and various types of social networks including general purposed networks, review sites and microblogging sites. We describe the collecting methodology and statistics in details, and evaluate several state-of-the-art network aligning approaches. Beside introducing the dataset, we further propose several potential research directions that benefit from ASNets.

[1]  Chun Chen,et al.  Mapping Users across Networks by Manifold Alignment on Hypergraph , 2014, AAAI.

[2]  Reza Zafarani,et al.  Connecting users across social media sites: a behavioral-modeling approach , 2013, KDD.

[3]  Philip S. Yu,et al.  Transferring heterogeneous links across location-based social networks , 2014, WSDM.

[4]  Meeyoung Cha,et al.  Social bootstrapping: how pinterest and last.fm social communities benefit by borrowing links from facebook , 2014, WWW.

[5]  Reza Zafarani,et al.  Understanding User Migration Patterns in Social Media , 2011, AAAI.

[6]  Pável Calado,et al.  Resolving user identities over social networks through supervised learning and rich similarity features , 2012, SAC '12.

[7]  Hannes Hartenstein,et al.  What Your Friends Tell Others About You: Low Cost Linkability of Social Network Profiles , 2011, SNAKDD 2011.

[8]  David M. Pennock,et al.  Categories and Subject Descriptors , 2001 .

[9]  Federica Cena,et al.  User identification for cross-system personalisation , 2009, Inf. Sci..

[10]  Fan Zhang,et al.  What's in a name?: an unsupervised approach to link users across communities , 2013, WSDM.

[11]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[12]  Vincent Y. Shen,et al.  User identification across multiple social networks , 2009, 2009 First International Conference on Networked Digital Technologies.

[13]  Philip S. Yu,et al.  Inferring anchor links across multiple heterogeneous social networks , 2013, CIKM.