NED: An Inter-Graph Node Metric Based On Edit Distance

Node similarity is a fundamental problem in graph analytics. However, node similarity between nodes in different graphs (inter-graph nodes) has not received a lot of attention yet. The inter-graph node similarity is important in learning a new graph based on the knowledge of an existing graph (transfer learning on graphs) and has applications in biological, communication, and social networks. In this paper, we propose a novel distance function for measuring inter-graph node similarity with edit distance, called NED. In NED, two nodes are compared according to their local neighborhood structures which are represented as unordered k-adjacent trees, without relying on labels or other assumptions. Since the computation problem of tree edit distance on unordered trees is NP-Complete, we propose a modified tree edit distance, called TED*, for comparing neighborhood trees. TED* is a metric distance, as the original tree edit distance, but more importantly, TED* is polynomially computable. As a metric distance, NED admits efficient indexing, provides interpretable results, and shows to perform better than existing approaches on a number of data analysis tasks, including graph de-anonymization. Finally, the efficiency and effectiveness of NED are empirically demonstrated using real-world graphs.

[1]  Nan Li,et al.  Neighborhood based fast graph search in large networks , 2011, SIGMOD '11.

[2]  Jure Leskovec,et al.  {SNAP Datasets}: {Stanford} Large Network Dataset Collection , 2014 .

[3]  Natasa Przulj,et al.  L-GRAAL: Lagrangian graphlet-based network aligner , 2015, Bioinform..

[4]  Danai Koutra,et al.  DeltaCon: Principled Massive-Graph Similarity Function with Attribution , 2016, ACM Trans. Knowl. Discov. Data.

[5]  Ruoming Jin,et al.  Axiomatic ranking of network role similarity , 2011, KDD.

[6]  Anthony K. H. Tung,et al.  Comparing Stars: On Approximating Graph Edit Distance , 2009, Proc. VLDB Endow..

[7]  Prateek Mittal,et al.  Graph Data Anonymization, De-Anonymization Attacks, and De-Anonymizability Quantification: A Survey , 2017, IEEE Communications Surveys & Tutorials.

[8]  Nils J. Nilsson,et al.  A Formal Basis for the Heuristic Determination of Minimum Cost Paths , 1968, IEEE Trans. Syst. Sci. Cybern..

[9]  Nikolaus Augsten,et al.  RTED: A Robust Algorithm for the Tree Edit Distance , 2011, Proc. VLDB Endow..

[10]  Ioannis Antonellis,et al.  Simrank++: query rewriting through link analysis of the clickgraph (poster) , 2007, Proc. VLDB Endow..

[11]  Henry G. Small,et al.  Co-citation in the scientific literature: A new measure of the relationship between two documents , 1973, J. Am. Soc. Inf. Sci..

[12]  Ge Yu,et al.  Efficiently Indexing Large Sparse Graphs for Similarity Search , 2012, IEEE Transactions on Knowledge and Data Engineering.

[13]  Natasa Przulj,et al.  Topology-function conservation in protein–protein interaction networks , 2015, Bioinform..

[14]  Jérôme Kunegis,et al.  KONECT: the Koblenz network collection , 2013, WWW.

[15]  Kaspar Riesen,et al.  Fast Suboptimal Algorithms for the Computation of Graph Edit Distance , 2006, SSPR/SPR.

[16]  Charu C. Aggarwal,et al.  NeMa: Fast Graph Search with Label Similarity , 2013, Proc. VLDB Endow..

[17]  Christos Faloutsos,et al.  It's who you know: graph mining using recursive structural features , 2011, KDD.

[18]  Jugal K. Kalita,et al.  A comparison of algorithms for the pairwise alignment of biological networks , 2014, Bioinform..

[19]  Xiaowei Xu,et al.  SCAN: a structural clustering algorithm for networks , 2007, KDD '07.

[20]  Jennifer Widom,et al.  SimRank: a measure of structural-context similarity , 2002, KDD.

[21]  Tao Jiang,et al.  Some MAX SNP-Hard Results Concerning Unordered Labeled Trees , 1994, Inf. Process. Lett..

[22]  Hongyan Liu,et al.  Measuring Similarity Based on Link Information: A Comparative Study , 2013, IEEE Transactions on Knowledge and Data Engineering.

[23]  Christos Faloutsos,et al.  Fast Random Walk with Restart and Its Applications , 2006, Sixth International Conference on Data Mining (ICDM'06).

[24]  George Kollios,et al.  NED: An Inter-Graph Node Metric on Edit Distance , 2016 .

[25]  Igor Jurisica,et al.  Modeling interactome: scale-free or geometric? , 2004, Bioinform..

[26]  Xing Xie,et al.  Effective Social Graph Deanonymization Based on Graph Structure and Descriptive Information , 2015, ACM Trans. Intell. Syst. Technol..

[27]  H. White,et al.  “Structural Equivalence of Individuals in Social Networks” , 2022, The SAGE Encyclopedia of Research Design.

[28]  Kuo-Chung Tai,et al.  The Tree-to-Tree Correction Problem , 1979, JACM.

[29]  Kaspar Riesen,et al.  Approximate graph edit distance computation by means of bipartite graph matching , 2009, Image Vis. Comput..

[30]  George Danezis,et al.  An Automated Social Graph De-anonymization Technique , 2014, WPES.

[31]  Kurt Mehlhorn,et al.  Weisfeiler-Lehman Graph Kernels , 2011, J. Mach. Learn. Res..

[32]  Kaizhong Zhang,et al.  On the Editing Distance Between Unordered Labeled Trees , 1992, Inf. Process. Lett..

[33]  Danai Koutra,et al.  Network similarity via multiple social theories , 2013, 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2013).

[34]  Kurt Mehlhorn,et al.  Efficient graphlet kernels for large graph comparison , 2009, AISTATS.

[35]  Christos Faloutsos,et al.  oddball: Spotting Anomalies in Weighted Graphs , 2010, PAKDD.

[36]  Jian Pei,et al.  More is Simpler: Effectively and Efficiently Assessing Node-Pair Similarities Based on Hyperlinks , 2013, Proc. VLDB Endow..

[37]  Philip S. Yu,et al.  PathSim , 2011, Proc. VLDB Endow..

[38]  Paul Van Dooren,et al.  A MEASURE OF SIMILARITY BETWEEN GRAPH VERTICES . WITH APPLICATIONS TO SYNONYM EXTRACTION AND WEB SEARCHING , 2002 .