Semi-Supervised Variational User Identity Linkage via Noise-Aware Self-Learning

User identity linkage, which aims to link identities of a natural person across different social platforms, has attracted increasing research interest recently. Existing approaches usually first embed the identities as deterministic vectors in a shared latent space, and then learn a classifier based on the available annotations. However, the formation and characteristics of realworld social platforms are full of uncertainties, which makes these deterministic embedding based methods sub-optimal. In addition, it is intractable to collect sufficient linkage annotations due to the tremendous gaps between different platforms. Semisupervised models utilize the unlabeled data to help capture the intrinsic data distribution, which are more promising in practical usage. However, the existing semi-supervised linkage methods heavily rely on the heuristically defined similarity measurements to incorporate the innate closeness between labeled and unlabeled samples. Such manually designed assumptions may not be consistent with the actual linkage signals and further introduce the noises. To address the mentioned limitations, in this paper we propose a novel Noise-aware Semi-supervised Variational User Identity Linkage (NSVUIL) model. Specifically, we first propose a novel supervised linkage module to incorporate the available annotations. Each social identity is represented by a Gaussian distribution in the Wasserstein space to simultaneously preserve the fine-grained social profiles and model the uncertainty of identities. Then, a noise-aware self-learning module is designed to faithfully augment the few available annotations, which is capable of filtering noises from the pseudo-labels generated by the supervised module. The filtered reliable candidates are added into the labeled set to provide enhanced training guidance for the next training iteration. Empirically, we evaluate the NSVUIL model over multiple real-world datasets, and the experimental results demonstrate its superiority.

[1]  Hong Chen,et al.  MEgo2Vec: Embedding Matched Ego Networks for User Alignment Across Social Networks , 2018, CIKM.

[2]  C. Givens,et al.  A class of Wasserstein metrics for probability distributions. , 1984 .

[3]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[4]  Pietro Liò,et al.  Graph Attention Networks , 2017, ICLR.

[5]  Reza Zafarani,et al.  Users joining multiple sites: Friendship and popularity variations across sites , 2016, Inf. Fusion.

[6]  Silvio Lattanzi,et al.  An efficient reconciliation algorithm for social networks , 2013, Proc. VLDB Endow..

[7]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[8]  Fan Zhang,et al.  What's in a name?: an unsupervised approach to link users across communities , 2013, WSDM.

[9]  Jacob Goldberger,et al.  Aligning Vector-spaces with Noisy Supervised Lexicons , 2019, NAACL-HLT.

[10]  Jure Leskovec,et al.  node2vec: Scalable Feature Learning for Networks , 2016, KDD.

[11]  Jing Xiao,et al.  User Identity Linkage by Latent User Space Modelling , 2016, KDD.

[12]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[13]  George Varghese,et al.  I seek you: searching and matching individuals in social networks , 2009, WIDM.

[14]  Reza Zafarani,et al.  User Identification Across Social Media , 2015, ACM Trans. Knowl. Discov. Data.

[15]  Zhoujun Li,et al.  Diabetes-Associated Factors as Predictors of Nursing Home Admission and Costs in the Elderly Across Europe. , 2017, Journal of the American Medical Directors Association.

[16]  Xiaoming Zhang,et al.  Distribution Distance Minimization for Unsupervised User Identity Linkage , 2018, CIKM.

[17]  Peter Fankhauser,et al.  Identifying Users Across Social Tagging Systems , 2011, ICWSM.

[18]  Philip S. Yu,et al.  COSNET: Connecting Heterogeneous Social Networks with Local and Global Consistency , 2015, KDD.

[19]  Philip S. Yu,et al.  Influence Maximization Across Partially Aligned Heterogenous Social Networks , 2015, PAKDD.

[20]  Reza Zafarani,et al.  User Identity Linkage across Online Social Networks: A Review , 2017, SKDD.

[21]  Zhoujun Li,et al.  Burst Time Prediction in Cascades , 2015, AAAI.

[22]  Yong Cao,et al.  CoLink: An Unsupervised Framework for User Identity Linkage , 2018, AAAI.

[23]  Zhoujun Li,et al.  Partially Shared Adversarial Learning For Semi-supervised Multi-platform User Identity Linkage , 2019, CIKM.

[24]  Quoc Viet Hung Nguyen,et al.  Structural representation learning for network alignment with self-supervised anchor links , 2021, Expert Syst. Appl..

[25]  Silvio Lattanzi,et al.  Linking Users Across Domains with Location Data: Theory and Validation , 2016, WWW.

[26]  Thomas Brox,et al.  SELF: Learning to Filter Noisy Labels with Self-Ensembling , 2019, ICLR.

[27]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[28]  Wenwu Zhu,et al.  Deep Variational Network Embedding in Wasserstein Space , 2018, KDD.

[29]  Gjergji Kasneci,et al.  SIGMa: simple greedy matching for aligning large knowledge bases , 2012, KDD.

[30]  Tao Chen,et al.  #mytweet via Instagram: Exploring user behaviour across multiple social networks , 2015, 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM).

[31]  Hannes Hartenstein,et al.  What Your Friends Tell Others About You: Low Cost Linkability of Social Network Profiles , 2011, SNAKDD 2011.

[32]  Claude Castelluccia,et al.  How Unique and Traceable Are Usernames? , 2011, PETS.

[33]  Xiaolong Jin,et al.  Predict Anchor Links across Social Networks via an Embedding Approach , 2016, IJCAI.

[34]  Lior Rokach,et al.  Entity Matching in Online Social Networks , 2013, 2013 International Conference on Social Computing.

[35]  Reza Zafarani,et al.  Users Joining Multiple Sites: Distributions and Patterns , 2014, ICWSM.

[36]  Andrew McCallum,et al.  Word Representations via Gaussian Embedding , 2014, ICLR.

[37]  Yiqun Liu,et al.  Online Social Network Profile Linkage , 2014, AIRS.

[38]  Li Liu,et al.  Aligning Users across Social Networks Using Network Embedding , 2016, IJCAI.

[39]  Xiang Zhu,et al.  Identifying users across social networks based on dynamic core interests , 2016, Neurocomputing.

[40]  Michelangelo Ceci,et al.  Self-training for multi-target regression with tree ensembles , 2017, Knowl. Based Syst..

[41]  Deli Zhao,et al.  Network Representation Learning with Rich Text Information , 2015, IJCAI.

[42]  Chun Chen,et al.  Mapping Users across Networks by Manifold Alignment on Hypergraph , 2014, AAAI.

[43]  Yanghua Xiao,et al.  Convolutional Gaussian Embeddings for Personalized Recommendation with Uncertainty , 2019, IJCAI.

[44]  Vincent Y. Shen,et al.  User identification across multiple social networks , 2009, 2009 First International Conference on Networked Digital Technologies.

[45]  Ramayya Krishnan,et al.  HYDRA: large-scale social identity linkage via heterogeneous behavior modeling , 2014, SIGMOD Conference.

[46]  Lei Zheng,et al.  Deep Distribution Network: Addressing the Data Sparsity Issue for Top-N Recommendation , 2019, SIGIR.

[47]  Zhoujun Li,et al.  Adversarial Learning for Weakly-Supervised Social Network Alignment , 2019, AAAI.

[48]  Huan Liu,et al.  Graph Neural Networks for User Identity Linkage , 2019, ArXiv.