A bipartite graph matching framework for finding correspondences between structural elements in two proteins

A protein molecule consists one or more chains of amino acid sequences that fold into a complex three-dimensional structure. A protein's functions are often determined by its 3D structure, and so comparing the similarity of 3D structures between proteins is an important problem. To accomplish such comparison, one must align two proteins properly with rotation and translation in 3D space. Finding the correspondences between structural elements in the two proteins is the key step in many protein structure alignment algorithms. We introduce a new graph theoretic framework based on bipartite graph matching for finding sufficiently good correspondences. It is capable of providing both sequence-dependent and sequence-independent correspondences. It is a general framework for pair-wise matching of atoms, amino acids residues or secondary structure elements.

[1]  H. Kuhn The Hungarian method for the assignment problem , 1955 .

[2]  S. B. Needleman,et al.  A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.

[3]  M G Rossmann,et al.  Comparison of super-secondary structures in proteins. , 1973, Journal of molecular biology.

[4]  P Argos,et al.  Exploring structural homology of proteins. , 1976, Journal of molecular biology.

[5]  G J Williams,et al.  The Protein Data Bank: a computer-based archival file for macromolecular structures. , 1978, Archives of biochemistry and biophysics.

[6]  G J Williams,et al.  The Protein Data Bank: a computer-based archival file for macromolecular structures. , 1978, Archives of biochemistry and biophysics.

[7]  K. S. Arun,et al.  Least-Squares Fitting of Two 3-D Point Sets , 1987, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  D. Labie,et al.  Molecular Evolution , 1991, Nature.

[9]  H. Wolfson,et al.  Efficient detection of three-dimensional structural motifs in biological macromolecules by computer vision techniques. , 1991, Proceedings of the National Academy of Sciences of the United States of America.

[10]  Paul J. Besl,et al.  A Method for Registration of 3-D Shapes , 1992, IEEE Trans. Pattern Anal. Mach. Intell..

[11]  David S. Johnson,et al.  Network Flows and Matching: First DIMACS Implementation Challenge , 1993 .

[12]  C. Sander,et al.  Protein structure comparison by alignment of distance matrices. , 1993, Journal of molecular biology.

[13]  P. Willett,et al.  A graph-theoretic approach to the identification of three-dimensional patterns of amino acid side-chains in protein structures. , 1994, Journal of molecular biology.

[14]  S. Bryant,et al.  Threading a database of protein cores , 1995, Proteins.

[15]  Kurt Mehlhorn,et al.  LEDA: a platform for combinatorial and geometric computing , 1997, CACM.

[16]  Chris Sander,et al.  3-D Lookup: Fast Protein Structure Database Searches at 90% Reliability , 1995, ISMB.

[17]  J F Gibrat,et al.  Surprising similarities in structure comparison. , 1996, Current opinion in structural biology.

[18]  M J Sippl,et al.  Optimum superimposition of protein structures: ambiguities and implications. , 1996, Folding & design.

[19]  Allen R. Hanson,et al.  Maximum-weight bipartite matching technique and its application in image feature matching , 1996, Other Conferences.

[20]  Douglas L. Brutlag,et al.  Hierarchical Protein Structure Superposition Using Both Secondary Structure and Atomic Representations , 1997, ISMB.

[21]  M Levitt,et al.  Comprehensive assessment of automatic structural alignment against a manual standard, the scop classification of proteins , 1998, Protein science : a publication of the Protein Society.

[22]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[23]  William R. Taylor,et al.  Structure Comparison and Structure Patterns , 2000, J. Comput. Biol..

[24]  P. Koehl,et al.  Protein structure similarities. , 2001, Current opinion in structural biology.

[25]  R Nussinov,et al.  Automated multiple structure alignment and detection of a common substructural motif , 2001, Proteins.

[26]  John D Westbrook,et al.  The PDB format, mmCIF, and other data formats. , 2003, Methods of biochemical analysis.

[27]  Fillia Makedon,et al.  R-Histogram: quantitative representation of spatial relations for similarity-based image retrieval , 2003, MULTIMEDIA '03.

[28]  Mattias Ohlsson,et al.  Matching protein structures with fuzzy alignments , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[29]  John D. Westbrook,et al.  The PDB Format, mmCIF Formats, and Other Data Formats , 2005 .

[30]  F. Eisenmenger,et al.  A fast unbiased comparison of protein structures by means of the Needleman-Wunsch algorithm , 1991, Journal of Molecular Evolution.