DSSLP: A Distributed Framework for Semi-supervised Link Prediction

Link prediction is widely used in a variety of industrial applications, such as merchant recommendation, fraudulent transaction detection, and so on. However, it’s a great challenge to train and deploy a link prediction model on industrial-scale graphs with billions of nodes and edges. In this work, we present a scalable and distributed framework for semi-supervised link prediction problem (named DSSLP), which is able to handle industrial-scale graphs. Instead of training model on the whole graph, DSSLP is proposed to train on the k-hops neighborhood of nodes in a mini-batch setting, which helps reduce the scale of the input graph and distribute the training procedure. In order to generate negative examples effectively, DSSLP contains a distributed batched runtime sampling module. It implements uniform and dynamic sampling approaches, and is able to adaptively construct positive and negative examples to guide the training process. Moreover, DSSLP proposes a model-split strategy to accelerate the speed of inference process of the link prediction task. Experimental results demonstrate that the effectiveness and efficiency of DSSLP in serval public datasets as well as real-world datasets of industrial-scale graphs.

[1]  Leo Katz,et al.  A new status index derived from sociometric analysis , 1953 .

[2]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[3]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[4]  M. Newman Clustering and preferential attachment in growing networks. , 2001, Physical review. E, Statistical, nonlinear, and soft matter physics.

[5]  B. Snel,et al.  Comparative assessment of large-scale data sets of protein–protein interactions , 2002, Nature.

[6]  Jon Kleinberg,et al.  The link prediction problem for social networks , 2003, CIKM '03.

[7]  Lada A. Adamic,et al.  Friends and neighbors on the Web , 2003, Soc. Networks.

[8]  Robert Ackland,et al.  Mapping the U.S. Political Blogosphere: Are Conservative Bloggers More Prominent? , 2005 .

[9]  M. Newman,et al.  Finding community structure in networks using the eigenvectors of matrices. , 2006, Physical review. E, Statistical, nonlinear, and soft matter physics.

[10]  M. Newman,et al.  Vertex similarity in networks. , 2005, Physical review. E, Statistical, nonlinear, and soft matter physics.

[11]  François Fouss,et al.  Random-Walk Computation of Similarities between Nodes of a Graph with Application to Collaborative Recommendation , 2007, IEEE Transactions on Knowledge and Data Engineering.

[12]  David Liben-Nowell,et al.  The link-prediction problem for social networks , 2007 .

[13]  Guo-Jie Li,et al.  Enhancing the transmission efficiency by edge deletion in scale-free networks. , 2007, Physical review. E, Statistical, nonlinear, and soft matter physics.

[14]  M. Newman,et al.  Hierarchical structure and the prediction of missing links in networks , 2008, Nature.

[15]  Hong Cheng,et al.  Graph Clustering Based on Structural/Attribute Similarities , 2009, Proc. VLDB Endow..

[16]  Linyuan Lü,et al.  Predicting missing links via local information , 2009, 0901.0553.

[17]  Thomas L. Griffiths,et al.  Nonparametric Latent Feature Models for Link Prediction , 2009, NIPS.

[18]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .

[19]  B. Wang,et al.  Information filtering based on transferring similarity. , 2008, Physical review. E, Statistical, nonlinear, and soft matter physics.

[20]  Linyuan Lu,et al.  Link prediction based on local random walk , 2010, 1001.2467.

[21]  Charles Elkan,et al.  Link Prediction via Matrix Factorization , 2011, ECML/PKDD.

[22]  Aaron Q. Li,et al.  Parameter Server for Distributed Machine Learning , 2013 .

[23]  Wiley Interscience Journal of the American Society for Information Science and Technology , 2013 .

[24]  Nicola Barbieri,et al.  Who to follow and why: link prediction with explanations , 2014, KDD.

[25]  Xueyan Jiang,et al.  Reducing the Rank in Relational Factorization Models by Including Observable Patterns , 2014, NIPS.

[26]  Steven Skiena,et al.  DeepWalk: online learning of social representations , 2014, KDD.

[27]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[28]  Jure Leskovec,et al.  node2vec: Scalable Feature Learning for Networks , 2016, KDD.

[29]  Nanning Zheng,et al.  Person Re-identification by Multi-Channel Parts-Based CNN with Improved Triplet Loss Function , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Max Welling,et al.  Variational Graph Auto-Encoders , 2016, ArXiv.

[31]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[32]  Yixin Chen,et al.  Weisfeiler-Lehman Neural Machine for Link Prediction , 2017, KDD.

[33]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[34]  Lan Du,et al.  Leveraging Node Attributes for Incomplete Relational Data , 2017, ICML.

[35]  Kaiqi Huang,et al.  Beyond Triplet Loss: A Deep Quadruplet Network for Person Re-identification , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Jure Leskovec,et al.  Inductive Representation Learning on Large Graphs , 2017, NIPS.

[37]  Jure Leskovec,et al.  Representation Learning on Graphs: Methods and Applications , 2017, IEEE Data Eng. Bull..

[38]  Kevin Chen-Chuan Chang,et al.  A Comprehensive Survey of Graph Embedding: Problems, Techniques, and Applications , 2017, IEEE Transactions on Knowledge and Data Engineering.

[39]  Pietro Liò,et al.  Graph Attention Networks , 2017, ICLR.

[40]  Palash Goyal,et al.  Graph Embedding Techniques, Applications, and Performance: A Survey , 2017, Knowl. Based Syst..

[41]  Yang Wang,et al.  Network-based prediction of protein interactions , 2018 .

[42]  Yixin Chen,et al.  Link Prediction Based on Graph Neural Networks , 2018, NeurIPS.

[43]  Albert-László Barabási,et al.  Network-based prediction of protein interactions , 2018, Nature Communications.

[44]  Xiaolong Li,et al.  GeniePath: Graph Neural Networks with Adaptive Receptive Paths , 2018, AAAI.

[45]  Alexander Peysakhovich,et al.  PyTorch-BigGraph: A Large-scale Graph Embedding System , 2019, SysML.

[46]  Yuan Qi,et al.  Cash-Out User Detection Based on Attributed Heterogeneous Information Network with a Hierarchical Attention Mechanism , 2019, AAAI.