Graphlets versus node2vec and struc2vec in the task of network alignment

Network embedding aims to represent each node in a network as a low-dimensional feature vector that summarizes the given node's (extended) network neighborhood. The nodes' feature vectors can then be used in various downstream machine learning tasks. Recently, many embedding methods that automatically learn the features of nodes have emerged, such as node2vec and struc2vec, which have been used in tasks such as node classification, link prediction, and node clustering, mainly in the social network domain. There are also other embedding methods that explicitly look at the connections between nodes, i.e., the nodes' network neighborhoods, such as graphlets. Graphlets have been used in many tasks such as network comparison, link prediction, and network clustering, mainly in the computational biology domain. Even though the two types of embedding methods (node2vec/struct2vec versus graphlets) have a similar goal -- to represent nodes as features vectors, no comparisons have been made between them, possibly because they have originated in the different domains. Therefore, in this study, we compare graphlets to node2vec and struc2vec, and we do so in the task of network alignment. In evaluations on synthetic and real-world biological networks, we find that graphlets are both more accurate and faster than node2vec and struc2vec.

[1]  Wayne B. Hayes,et al.  SANA: simulated annealing far outperforms many other search algorithms for biological network alignment , 2017, Bioinform..

[2]  Steve Harenberg,et al.  Community detection in large‐scale networks: a survey and empirical evaluation , 2014 .

[3]  Michalis Vazirgiannis,et al.  Clustering and Community Detection in Directed Networks: A Survey , 2013, ArXiv.

[4]  Stephen A. Cook,et al.  The complexity of theorem-proving procedures , 1971, STOC.

[5]  Jing Tao,et al.  A Fast Sampling Method of Exploring Graphlet Degrees of Large Directed and Undirected Graphs , 2016, ArXiv.

[6]  Lei Meng,et al.  The post-genomic era of biological network alignment , 2015, EURASIP J. Bioinform. Syst. Biol..

[7]  Philip S. Yu,et al.  COSNET: Connecting Heterogeneous Social Networks with Local and Global Consistency , 2015, KDD.

[8]  Aaron Striegel,et al.  Local versus global biological network alignment , 2015, Bioinform..

[9]  Tijana Milenkovic,et al.  From homogeneous to heterogeneous network alignment , 2017 .

[10]  Tijana Milenkovic,et al.  GraphCrunch: A tool for large network analyses , 2008, BMC Bioinformatics.

[11]  Lun Yang,et al.  Identification of Human Disease Genes from Interactome Network Using Graphlet Interaction , 2014, PloS one.

[12]  Mark Newman,et al.  Networks: An Introduction , 2010 .

[13]  Tijana Milenkovic,et al.  Pairwise versus multiple network alignment , 2017 .

[14]  Jie Tang,et al.  Simultaneous Optimization of both Node and Edge Conservation in Network Alignment via WAVE , 2014, WABI.

[15]  Jian Pei,et al.  A Survey on Network Embedding , 2017, IEEE Transactions on Knowledge and Data Engineering.

[16]  Janez Demsar,et al.  A combinatorial approach to graphlet counting , 2014, Bioinform..

[17]  Tijana Milenkovic,et al.  Proper evaluation of alignment-free network comparison methods , 2015, Bioinform..

[18]  Ryan W. Solava,et al.  Revealing Missing Parts of the Interactome via Link Prediction , 2014, PloS one.

[19]  Tijana Milenkovic,et al.  Dynamic networks reveal key players in aging , 2013, BCB.

[20]  Tijana Milenkovic,et al.  Aligning dynamic networks with DynaWAVE , 2018, Bioinform..

[21]  Joseph Crawford,et al.  ClueNet: Clustering a temporal network based on topological similarity rather than denseness , 2018, PloS one.

[22]  Han Zhao,et al.  Global network alignment in the context of aging , 2015, TCBB.

[23]  Nitesh V. Chawla,et al.  metapath2vec: Scalable Representation Learning for Heterogeneous Networks , 2017, KDD.

[24]  Elaine Shi,et al.  Link prediction by de-anonymization: How We Won the Kaggle Social Network Challenge , 2011, The 2011 International Joint Conference on Neural Networks.

[25]  Jure Leskovec,et al.  node2vec: Scalable Feature Learning for Networks , 2016, KDD.

[26]  Tijana Milenkovic,et al.  Graphlet-based edge clustering reveals pathogen-interacting proteins , 2012, Bioinform..

[27]  Vipin Vijayan,et al.  Multiple Network Alignment via MultiMAGNA++ , 2016, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[28]  Ulrik Brandes,et al.  What is network science? , 2013, Network Science.

[29]  Ulrik Brandes,et al.  Quad Census Computation: Simple, Efficient, and Orbit-Aware , 2016, NetSci-X.

[30]  Tijana Milenkovic,et al.  Exploring the structure and function of temporal networks with dynamic graphlets , 2015, Bioinform..

[31]  Pietro Hiram Guzzi,et al.  Survey of local and global biological network alignment: the need to reconcile the two sides of the same coin , 2017, Briefings Bioinform..

[32]  Igor Jurisica,et al.  Modeling interactome: scale-free or geometric? , 2004, Bioinform..

[33]  Albert-Lszl Barabsi,et al.  Network Science , 2016, Encyclopedia of Big Data.

[34]  Jure Leskovec,et al.  Predicting multicellular function through multi-layer tissue networks , 2017, Bioinform..

[35]  Daniel R. Figueiredo,et al.  struc2vec: Learning Node Representations from Structural Identity , 2017, KDD.

[36]  Omkar Singh,et al.  Graphlet signature-based scoring method to estimate protein–ligand binding affinity , 2014, Royal Society Open Science.

[37]  Ryan A. Rossi,et al.  Efficient Graphlet Counting for Large Networks , 2015, 2015 IEEE International Conference on Data Mining.

[38]  Tijana Milenkovic,et al.  Alignment of dynamic networks , 2017, Bioinform..

[39]  Tijana Milenkoviæ,et al.  Uncovering Biological Network Function via Graphlet Degree Signatures , 2008, Cancer informatics.

[40]  Shawn Gu,et al.  From homogeneous to heterogeneous network alignment via colored graphlets , 2017, Scientific Reports.

[41]  Nitesh V. Chawla,et al.  Evaluating link prediction methods , 2014, Knowledge and Information Systems.