COSNET: Connecting Heterogeneous Social Networks with Local and Global Consistency

More often than not, people are active in more than one social network. Identifying users from multiple heterogeneous social networks and integrating the different networks is a fundamental issue in many applications. The existing methods tackle this problem by estimating pairwise similarity between users in two networks. However, those methods suffer from potential inconsistency of matchings between multiple networks. In this paper, we propose COSNET (COnnecting heterogeneous Social NETworks with local and global consistency), a novel energy-based model, to address this problem by considering both local and global consistency among multiple networks. An efficient subgradient algorithm is developed to train the model by converting the original energy-based objective function into its dual form. We evaluate the proposed model on two different genres of data collections: SNS and Academia, each consisting of multiple heterogeneous social networks. Our experimental results validate the effectiveness and efficiency of the proposed model. On both data collections, the proposed COSNET method significantly outperforms several alternative methods by up to 10-30% (p << 0:001, t-test) in terms of F1-score. We also demonstrate that applying the integration results produced by our method can improve the accuracy of expert finding, an important task in social networks.

[1]  Cynthia Dwork,et al.  Wherefore art thou r3579x?: anonymized social networks, hidden patterns, and structural steganography , 2007, WWW '07.

[2]  Chun Chen,et al.  Mapping Users across Networks by Manifold Alignment on Hypergraph , 2014, AAAI.

[3]  Aditya G. Parameswaran,et al.  Active sampling for entity matching , 2012, KDD.

[4]  Joseph M. Hellerstein,et al.  Distributed GraphLab: A Framework for Machine Learning in the Cloud , 2012, Proc. VLDB Endow..

[5]  Yang Li,et al.  Mining evidences for named entity disambiguation , 2013, KDD.

[6]  Sharath Pankanti,et al.  Fingerprint Representation Using Localized Texture Features , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[7]  Wei Shen,et al.  Linking named entities in Tweets with knowledge base via user interest modeling , 2013, KDD.

[8]  Duncan J. Watts,et al.  Who says what to whom on twitter , 2011, WWW.

[9]  H. Whitney Congruent Graphs and the Connectivity of Graphs , 1932 .

[10]  Jian Pei,et al.  Finding email correspondents in online social networks , 2013, World Wide Web.

[11]  Sharath Pankanti,et al.  The relation between the ROC curve and the CMC , 2005, Fourth IEEE Workshop on Automatic Identification Advanced Technologies (AutoID'05).

[12]  Vitaly Shmatikov,et al.  De-anonymizing Social Networks , 2009, 2009 30th IEEE Symposium on Security and Privacy.

[13]  Jie Tang,et al.  ArnetMiner: extraction and mining of academic social networks , 2008, KDD.

[14]  Rizal Setya Perdana What is Twitter , 2013 .

[15]  Laurence A. Wolsey,et al.  Two “well-known” properties of subgradient optimization , 2009, Math. Program..

[16]  Nalini K. Ratha,et al.  Cancelable Biometrics: A Case Study in Fingerprints , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[17]  Reza Zafarani,et al.  Connecting Corresponding Identities across Communities , 2009, ICWSM.

[18]  Michael R. Lyu,et al.  SoRec: social recommendation using probabilistic matrix factorization , 2008, CIKM '08.

[19]  Nikos Komodakis,et al.  Efficient training for pairwise or higher order CRFs via dual decomposition , 2011, CVPR 2011.

[20]  Sharath Pankanti,et al.  Novel Approaches for Minutiae Verification in Fingerprint Images , 2005, 2005 Seventh IEEE Workshops on Applications of Computer Vision (WACV/MOTION'05) - Volume 1.

[21]  Juan-Zi Li,et al.  Expert Finding in a Social Network , 2007, DASFAA.

[22]  Reza Zafarani,et al.  Connecting users across social media sites: a behavioral-modeling approach , 2013, KDD.

[23]  Hosung Park,et al.  What is Twitter, a social network or a news media? , 2010, WWW '10.

[24]  Silviu Cucerzan,et al.  Large-Scale Named Entity Disambiguation Based on Wikipedia Data , 2007, EMNLP.

[25]  Claude Castelluccia,et al.  How Unique and Traceable Are Usernames? , 2011, PETS.

[26]  Nikos Komodakis,et al.  MRF Energy Minimization and Beyond via Dual Decomposition , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Yi Li,et al.  RiMOM: A Dynamic Multistrategy Ontology Alignment Framework , 2009, IEEE Transactions on Knowledge and Data Engineering.

[28]  Bo Gao,et al.  On optimization of expertise matching with various constraints , 2012, Neurocomputing.

[29]  Philip S. Yu,et al.  Inferring anchor links across multiple heterogeneous social networks , 2013, CIKM.

[30]  William W. Cohen,et al.  A Comparison of String Metrics for Matching Names and Records , 2003 .

[31]  Jie Tang,et al.  Mining structural hole spanners through information diffusion in social networks , 2013, WWW.

[32]  Wei Chen,et al.  A game-theoretic framework to identify overlapping communities in social networks , 2010, Data Mining and Knowledge Discovery.

[33]  Gjergji Kasneci,et al.  SIGMa: simple greedy matching for aligning large knowledge bases , 2012, KDD.

[34]  Fan Zhang,et al.  What's in a name?: an unsupervised approach to link users across communities , 2013, WSDM.

[35]  Thore Graepel,et al.  Large Margin Rank Boundaries for Ordinal Regression , 2000 .

[36]  Ravindra K. Ahuja,et al.  Network Flows: Theory, Algorithms, and Applications , 1993 .

[37]  Karen Rose,et al.  What is Twitter , 2009 .

[38]  Ben Taskar,et al.  Max-Margin Markov Networks , 2003, NIPS.

[39]  Ramayya Krishnan,et al.  HYDRA: large-scale social identity linkage via heterogeneous behavior modeling , 2014, SIGMOD Conference.

[40]  A. Maslow A Theory of Human Motivation , 1943 .

[41]  Flavio Paiva Junqueira,et al.  Exploiting user clicks for automatic seed set generation for entity matching , 2013, KDD.

[42]  Rajeev Rastogi,et al.  Entity disambiguation with hierarchical topic models , 2011, KDD.

[43]  Fu Jie Huang,et al.  A Tutorial on Energy-Based Learning , 2006 .

[44]  Lise Getoor,et al.  Collective entity resolution in relational data , 2007, TKDD.

[45]  Ellen M. Voorhees,et al.  Retrieval evaluation with incomplete information , 2004, SIGIR '04.

[46]  Huan Liu,et al.  eTrust: understanding trust evolution in an online world , 2012, KDD.

[47]  Matthias Grossglauser,et al.  On the performance of percolation graph matching , 2013, COSN '13.