A family of tractable graph metrics

Important data mining problems such as nearest-neighbor search and clustering admit theoretical guarantees when restricted to objects embedded in a metric space. Graphs are ubiquitous, and clustering and classification over graphs arise in diverse areas, including, e.g., image processing and social networks. Unfortunately, popular distance scores used in these applications, that scale over large graphs, are not metrics and thus come with no guarantees. Classic graph distances such as, e.g., the chemical distance and the Chartrand-Kubiki-Shultz distance are arguably natural and intuitive, and are indeed also metrics, but they are intractable: as such, their computation does not scale to large graphs. We define a broad family of graph distances, that includes both the chemical and the Chartrand-Kubiki-Shultz distances, and prove that these are all metrics. Crucially, we show that our family includes metrics that are tractable. Moreover, we extend these distances by incorporating auxiliary node attributes, which is important in practice, while maintaining both the metric property and tractability.

[1]  Elaine Shi,et al.  Link prediction by de-anonymization: How We Won the Kaggle Social Network Challenge , 2011, The 2011 International Joint Conference on Neural Networks.

[2]  Stephen P. Boyd,et al.  Block splitting for distributed optimization , 2013, Mathematical Programming Computation.

[3]  Charles R. Johnson,et al.  Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.

[4]  Jun Wang,et al.  Consistency-Driven Alternating Optimization for Multigraph Matching: A Unified Approach , 2015, IEEE Transactions on Image Processing.

[5]  R. Glowinski,et al.  Sur l'approximation, par éléments finis d'ordre un, et la résolution, par pénalisation-dualité d'une classe de problèmes de Dirichlet non linéaires , 1975 .

[6]  Vladimír Kvasnička,et al.  Reaction and chemical distances and reaction graphs , 1991 .

[7]  Hector Garcia-Molina,et al.  Web graph similarity for anomaly detection , 2010, Journal of Internet Services and Applications.

[8]  K. Clarkson Nearest-Neighbor Searching and Metric Space Dimensions , 2005 .

[9]  Jianxin Wu,et al.  Person Re-Identification with Correspondence Structure Learning , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[10]  Jaap Heringa,et al.  Natalie 2.0: Sparse Global Network Alignment as a Special Case of Quadratic Assignment , 2015, Algorithms.

[11]  Alice J. O'Toole,et al.  DISTATIS: The Analysis of Multiple Distance Matrices , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Workshops.

[12]  Danai Koutra,et al.  DELTACON: A Principled Massive-Graph Similarity Function , 2013, SDM.

[13]  H. Kuhn The Hungarian method for the assignment problem , 1955 .

[14]  Christoph Schnörr,et al.  Evaluation of Convex Optimization Techniques for the Weighted Graph-Matching Problem in Computer Vision , 2001, DAGM-Symposium.

[15]  Mohammad Rostami,et al.  Testing Fine-Grained Parallelism for the ADMM on a Factor-Graph , 2016, 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).

[16]  A. Volgenant,et al.  A shortest augmenting path algorithm for dense and sparse linear assignment problems , 1987, Computing.

[17]  Ping Zhu,et al.  A study of graph spectra for comparing graphs and trees , 2008, Pattern Recognit..

[18]  G. Chartrand,et al.  Graph similarity and distance in graphs , 1998 .

[19]  Francis R. Bach,et al.  Many-to-Many Graph Matching: a Continuous Relaxation Approach , 2010, ECML/PKDD.

[20]  Horst Bunke,et al.  On a relation between graph edit distance and maximum common subgraph , 1997, Pattern Recognit. Lett..

[21]  Ying Wang,et al.  Algorithms for Large, Sparse Network Alignment Problems , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[22]  Béla Bollobás,et al.  Random Graphs , 1985 .

[23]  Ernest Valveny,et al.  Generalized median graph computation by means of graph embedding in vector spaces , 2010, Pattern Recognit..

[24]  F. Allen The Cambridge Structural Database: a quarter of a million crystal structures and rising. , 2002, Acta crystallographica. Section B, Structural science.

[25]  Shinji Umeyama,et al.  An Eigendecomposition Approach to Weighted Graph Matching Problems , 1988, IEEE Trans. Pattern Anal. Mach. Intell..

[26]  Leonidas J. Guibas,et al.  Distributable Consistent Multi-Graph Matching , 2016, ArXiv.

[27]  Jean-Michel Jolion,et al.  Graph Based Representations in Pattern Recognition , 1998, Computing Supplement.

[28]  John Langford,et al.  Cover trees for nearest neighbor , 2006, ICML.

[29]  Kenneth L. Clarkson,et al.  Nearest Neighbor Queries in Metric Spaces , 1997, STOC '97.

[30]  Stratis Ioannidis,et al.  A Family of Tractable Graph Distances , 2018, SDM.

[31]  Jon M. Kleinberg,et al.  The small-world phenomenon: an algorithmic perspective , 2000, STOC '00.

[32]  Fernando De la Torre,et al.  Factorized Graph Matching , 2016, IEEE Trans. Pattern Anal. Mach. Intell..

[33]  Stephen E. Stein,et al.  The Drosophila melanogaster PeptideAtlas facilitates the use of peptide data for improved fly proteomics and genome annotation , 2009, BMC Bioinformatics.

[34]  Albert-László Barabási,et al.  Statistical mechanics of complex networks , 2001, ArXiv.

[35]  Mordecai Avriel,et al.  Mathematical Programming for Industrial Engineers , 1997 .

[36]  G. Francca,et al.  How is Distributed ADMM Affected by Network Topology , 2017, 1710.00889.

[37]  László Babai,et al.  Graph isomorphism in quasipolynomial time [extended abstract] , 2016, STOC.

[38]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[39]  Marcel R. Ackermann,et al.  Clustering for metric and non-metric distance measures , 2008, SODA '08.

[40]  Edwin R. Hancock,et al.  Measuring Graph Similarity Using Spectral Geometry , 2008, ICIAR.

[41]  Hongyuan Zha,et al.  Multi-Graph Matching via Affinity Optimization with Graduated Consistency Regularization , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[42]  Philip N. Klein,et al.  Recognition of shapes by editing their shock graphs , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[43]  Kaspar Riesen,et al.  Graph Embedding in Vector Spaces by Means of Prototype Selection , 2007, GbRPR.

[44]  Yurii Nesterov,et al.  Interior-point polynomial algorithms in convex programming , 1994, Siam studies in applied mathematics.

[45]  William G. Wee,et al.  Object Recognition and Recovery by Skeleton Graph Matching , 2006, 2006 IEEE International Conference on Multimedia and Expo.

[46]  Kaspar Riesen,et al.  Approximate graph edit distance computation by means of bipartite graph matching , 2009, Image Vis. Comput..

[47]  Edward R. Scheinerman,et al.  Fractional isomorphism of graphs , 1994, Discret. Math..

[48]  Gottfried Tinhofer,et al.  Graph isomorphism and theorems of Birkhoff type , 1986, Computing.

[49]  Michael I. Jordan,et al.  Distance Metric Learning with Application to Clustering with Side-Information , 2002, NIPS.

[50]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[51]  Horst Bunke,et al.  A graph distance metric based on the maximal common subgraph , 1998, Pattern Recognit. Lett..

[52]  Whitman Richards,et al.  Graph Comparison Using Fine Structure Analysis , 2010, 2010 IEEE Second International Conference on Social Computing.

[53]  Gunnar W. Klau,et al.  A new graph-based method for pairwise global network alignment , 2009, BMC Bioinformatics.

[54]  Bonnie Berger,et al.  Pairwise Global Alignment of Protein Interaction Networks by Matching Neighborhood Topology , 2007, RECOMB.

[55]  Scott Shenker,et al.  Spark: Cluster Computing with Working Sets , 2010, HotCloud.

[56]  King-Sun Fu,et al.  A distance measure between attributed relational graphs for pattern recognition , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[57]  Dimitri P. Bertsekas,et al.  Nonlinear Programming , 1997 .

[58]  A. Hoffman,et al.  The variation of the spectrum of a normal matrix , 1953 .

[59]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[60]  Vikas Singh,et al.  Solving the multi-way matching problem by permutation synchronization , 2013, NIPS.

[61]  Ping Zhu,et al.  A Study of Graph Spectra for Comparing Graphs , 2005, BMVC.

[62]  Steven Gold,et al.  Softmax to Softassign: neural network algorithms for combinatorial optimization , 1996 .

[63]  Trevor Darrell,et al.  Nearest-Neighbor Searching and Metric Space Dimensions , 2006 .

[64]  Y. Aflalo,et al.  On convex relaxation of graph isomorphism , 2015, Proceedings of the National Academy of Sciences.

[65]  Robert E. Mahony,et al.  Optimization Algorithms on Matrix Manifolds , 2007 .

[66]  Guilherme França,et al.  Markov Chain Lifting and Distributed ADMM , 2017, IEEE Signal Processing Letters.

[67]  John A. Hartigan,et al.  Clustering Algorithms , 1975 .

[68]  B. Mercier,et al.  A dual algorithm for the solution of nonlinear variational problems via finite element approximation , 1976 .

[69]  Clara Pizzuti,et al.  Fast Outlier Detection in High Dimensional Spaces , 2002, PKDD.

[70]  Discriminant Subspace,et al.  PATTERN ANALYSIS AND MACHINE INTELLIGENCE A publication of the IEEE Computer Society , 2007 .

[71]  Brijnesh J. Jain,et al.  On the geometry of graph spaces , 2016, Discret. Appl. Math..

[72]  Guilherme França,et al.  An explicit rate bound for over-relaxed ADMM , 2015, 2016 IEEE International Symposium on Information Theory (ISIT).

[73]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[74]  Tina Eliassi-Rad,et al.  A Guide to Selecting a Network Similarity Method , 2014, SDM.

[75]  Piotr Indyk,et al.  Sublinear time algorithms for metric space problems , 1999, STOC '99.

[76]  Kaspar Riesen,et al.  Graph Classification and Clustering Based on Vector Space Embedding , 2010, Series in Machine Perception and Artificial Intelligence.

[77]  Kaspar Riesen,et al.  Speeding Up Graph Edit Distance Computation through Fast Bipartite Matching , 2011, GbRPR.

[78]  Francesc Serratosa,et al.  Graduated Assignment Algorithm for Finding the Common Labelling of a Set of Graphs , 2010, SSPR/SPR.

[79]  Danai Koutra,et al.  BIG-ALIGN: Fast Bipartite Graph Alignment , 2013, 2013 IEEE 13th International Conference on Data Mining.

[80]  M. Zaslavskiy,et al.  A Path Following Algorithm for the Graph Matching Problem , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[81]  Guillermo Sapiro,et al.  Graph Matching: Relax at Your Own Risk , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[82]  Francis Bach,et al.  Global alignment of protein–protein interaction networks by graph matching methods , 2009, Bioinform..

[83]  Jerzy Szymanski,et al.  On the Structure of Random Plane-oriented Recursive Trees and Their Branches , 1993, Random Struct. Algorithms.

[84]  Jaroslav Koča,et al.  Synthon Model of Organic Chemistry and Synthesis Design , 1989 .

[85]  Mario Vento,et al.  Thirty Years Of Graph Matching In Pattern Recognition , 2004, Int. J. Pattern Recognit. Artif. Intell..