Global graph matching using diffusion maps

We present a new algorithm, global positioning graph matching (GPGM), to perform global network alignments between pairs of undirected graphs by minimizing a dissimilarity score over matched vertices. We define structural dissimilarities based on a random walk over each graph to provide a robust measure of the global graph topology using a nonlinear manifold learning algorithm known as diffusion maps. Measures of vertex-vertex dissimilarity are straightforwardly incorporated in a convex combination. We have tested our approach in pairwise alignments of protein-protein interaction networks of Xenopus laevis (frog), Rattus norvegicus (rat), Caenorhabditis elegans (worm), Mus musculus (mouse), and Drosophila melanogaster (fly). When vertex-vertex dissimilarities are incorporated using homology scores between protein sequences, the performance of GPGM is comparable to that of the IsoRank algorithm (Singh et al. Proc. Natl. Acad. Sci. USA 105 35 12763 (2008)). When homology information is not included, GPGM discovers superior alignments, making it well suited to graph matching applications where vertex labels are unknown or undefined.

[1]  Kaleem Siddiqi,et al.  Robust and efficient skeletal graphs , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[2]  R. Karp,et al.  Conserved pathways within bacteria and yeast as revealed by global protein network alignment , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[3]  Stéphane Lafon,et al.  Diffusion maps , 2006 .

[4]  Andrew L. Ferguson,et al.  An experimental and computational investigation of spontaneous lasso formation in microcin J25. , 2010, Biophysical journal.

[5]  R. Larsen Lanczos Bidiagonalization With Partial Reorthogonalization , 1998 .

[6]  Minsu Cho,et al.  Reweighted Random Walks for Graph Matching , 2010, ECCV.

[7]  Ioannis G. Kevrekidis,et al.  Nonlinear dimensionality reduction in molecular simulation: The diffusion map approach , 2011 .

[8]  Mikhail Belkin,et al.  Laplacian Eigenmaps for Dimensionality Reduction and Data Representation , 2003, Neural Computation.

[9]  Bonnie Berger,et al.  IsoRankN: spectral methods for global alignment of multiple protein networks , 2009, Bioinform..

[10]  Jianbo Shi,et al.  Balanced Graph Matching , 2006, NIPS.

[11]  Mike Tyers,et al.  BioGRID: a general repository for interaction datasets , 2005, Nucleic Acids Res..

[12]  R. O. Saber Consensus and cooperation in networked multi-Agent systems , 2007 .

[13]  Francis Bach,et al.  Global alignment of protein–protein interaction networks by graph matching methods , 2009, Bioinform..

[14]  T. Ideker,et al.  Integrative approaches for finding modular structure in biological networks , 2013, Nature Reviews Genetics.

[15]  Jean Ponce,et al.  A Tensor-Based Algorithm for High-Order Graph Matching , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  David M Blei,et al.  Efficient discovery of overlapping communities in massive networks , 2013, Proceedings of the National Academy of Sciences.

[17]  D. Lipman,et al.  Improved tools for biological sequence comparison. , 1988, Proceedings of the National Academy of Sciences of the United States of America.

[18]  Andrew L. Ferguson,et al.  Nonlinear machine learning of patchy colloid self-assembly pathways and mechanisms. , 2014, The journal of physical chemistry. B.

[19]  Andrew L. Ferguson,et al.  Systematic determination of order parameters for chain dynamics using diffusion maps , 2010, Proceedings of the National Academy of Sciences.

[20]  Harold W. Kuhn,et al.  The Hungarian method for the assignment problem , 1955, 50 Years of Integer Programming.

[21]  Mario Vento,et al.  Thirty Years Of Graph Matching In Pattern Recognition , 2004, Int. J. Pattern Recognit. Artif. Intell..

[22]  Robert H. Halstead,et al.  Matrix Computations , 2011, Encyclopedia of Parallel Computing.

[23]  Ronald R. Coifman,et al.  Graph Laplacian Tomography From Unknown Random Projections , 2008, IEEE Transactions on Image Processing.

[24]  David Baker,et al.  Structure similarity measure with penalty for close non-equivalent residues , 2009, Bioinform..

[25]  Jan G. Korvink,et al.  Fast Simulation of Electro-Thermal MEMS: Efficient Dynamic Compact Models , 2006 .

[26]  Sourav Bandyopadhyay,et al.  Systematic identification of functional orthologs based on protein network comparison. , 2006, Genome research.

[27]  W R Pearson,et al.  Comparison of DNA sequences with protein sequences. , 1997, Genomics.

[28]  Ronald R. Coifman,et al.  Diffusion Maps, Spectral Clustering and Eigenfunctions of Fokker-Planck Operators , 2005, NIPS.

[29]  M E J Newman,et al.  Finding and evaluating community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[30]  Bonnie Berger,et al.  Global alignment of multiple protein interaction networks with application to functional orthology detection , 2008, Proceedings of the National Academy of Sciences.

[31]  Gunnar W. Klau,et al.  A new graph-based method for pairwise global network alignment , 2009, BMC Bioinformatics.

[32]  Wenfei Fan,et al.  Graph pattern matching revised for social network analysis , 2012, ICDT '12.

[33]  R. Jackson Inequalities , 2007, Algebra for Parents.

[34]  S. Henikoff,et al.  Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[35]  Martial Hebert,et al.  A spectral technique for correspondence problems using pairwise constraints , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[36]  Amit Singer,et al.  A remark on global positioning from local distances , 2008, Proceedings of the National Academy of Sciences.