Graph Comparison by Log-Odds Score Matrices with Application to Protein Topology Analysis

A TOPS diagram is a simplified description of the topology of a protein using a graph where nodes are α-helices and β-strands, and edges correspond to chirality relations and parallel or antiparallel bonds between strands. We present a matching algorithm between two TOPS diagrams where the likelihood of a match is measured according to previously known matches between complete 3D structures. This totally new 3D training is recorded on transition matrices that count the likelihood that a given TOPS feature, or combination thereof, is replaced by another feature on homologs. The new algorithm outperforms existing ones on a benchmark database. Some biologically significant examples are discussed as well. The method can be used whenever frequencies of edge relationship matches are known, as it is the case for several biopolymer structures.

[1]  David C. Jones,et al.  CATH--a hierarchic classification of protein domain structures. , 1997, Structure.

[2]  W. Pearson,et al.  Sensitivity and selectivity in protein structure comparison , 2004, Protein science : a publication of the Protein Society.

[3]  A. Sali,et al.  Structural genomics: beyond the Human Genome Project , 1999, Nature Genetics.

[4]  K. Nishikawa,et al.  Protein structure comparison using the Markov transition model of evolution , 2000, Proteins.

[5]  T. P. Flores,et al.  An algorithm for automatically generating protein topology cartoons. , 1994, Protein engineering.

[6]  S. Bryant,et al.  Threading a database of protein cores , 1995, Proteins.

[7]  Richard C. Wilson,et al.  Flexible structural protein alignment by a sequence of local transformations , 2009, Bioinform..

[8]  David R. Gilbert,et al.  Pattern Matching and Pattern Discovery Algorithms for Protein Topologies , 2001, WABI.

[9]  T. P. Flores,et al.  Protein structural topology: Automated analysis and diagrammatic representation , 2008, Protein science : a publication of the Protein Society.

[10]  David R. Gilbert,et al.  Protein structure comparison based o n profiles of topological motifs: a feasible way to deal with information from negative examples , 2003, German Conference on Bioinformatics.

[11]  Philip E. Bourne,et al.  The RCSB PDB information portal for structural genomics , 2005, Nucleic Acids Res..

[12]  K Henrick,et al.  Electronic Reprint Biological Crystallography Secondary-structure Matching (ssm), a New Tool for Fast Protein Structure Alignment in Three Dimensions Biological Crystallography Secondary-structure Matching (ssm), a New Tool for Fast Protein Structure Alignment in Three Dimensions , 2022 .

[13]  K. Mizuguchi,et al.  Comparison of spatial arrangements of secondary structural elements in proteins. , 1995, Protein engineering.

[14]  T Madej,et al.  Hamiltonians for protein tertiary structure prediction based on three-dimensional environment principles. , 1993, Journal of molecular biology.

[15]  David R. Gilbert,et al.  Assessment of the probabilities for evolutionary structural changes in protein folds , 2007, Bioinform..

[16]  David R. Gilbert,et al.  Protein structure topological comparison, discovery and matching service , 2005, Bioinform..

[17]  Sung-Hou Kim Shining a light on structural genomics , 1998, Nature Structural Biology.

[18]  David R. Gilbert,et al.  Motif-based searching in TOPS protein topology databases , 1999, Bioinform..

[19]  Sean R. Eddy,et al.  Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .

[20]  Patrice Koehl,et al.  The ASTRAL Compendium in 2004 , 2003, Nucleic Acids Res..

[21]  David R. Gilbert,et al.  A Computer System to Perform Structure Comparison using Representations of Protein Structure , 2002, Comput. Chem..

[22]  Thomas Lengauer,et al.  An Algorithm for Finding Maximal Common Subtopologies in a Set of Protein Structures , 1996, J. Comput. Biol..

[23]  J M Thornton,et al.  An atlas of protein topology cartoons available on the World-Wide Web. , 1998, Trends in biochemical sciences.